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The Markov chain model, with extensions to cover the phenomena 
of arrivals and departures, was applied to a population of savings accounts, 
in a savings institution, to forecast the size distribution, total number 
of accounts and total amount of savings of the population. The stochastic 
processes governing the behavior of the population were first assumed to 
be time stationary. This assumption was then relaxed and an econometric 
model was used to predict future values of the parameters of the non- 
stationary model. Both models were validated by comparing predicted 
size distributions, total number of accounts and total amount of savings 
against observed values. The chi square goodness of fit test was used 
in the comparison. The fundamental matrix of the stationary model was 
also used to predict the equilibrium distribution and related measures of 
the population. 
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I. INTRODUCTION 



A. PURPOSE 

It is the purpose of this thesis to develop and evaluate two analytical 
models which can be used to forecast the structure of a population of savers 
and the level of savings of a savings institution. The population of savers 
is divided into a finite number of classes and the structure is the distri- 
bution of savers among the classes. 

B. BACKGROUND 

While it is difficult, if not impossible, to predict the future behavior 
of an individual it is believed that the aggregate behavior of a population 
is less erratic and, therefore, more amenable to analysis and prediction. 
Assuming that a large population has considerable inertia, current trends 
can be used to project into the future. 

The rate of change of the structure and characteristics of a popu- 
lation can, at times', be considered to be dependent upon its size, 
external forces which affect the members of the population and the 
response to these forces. 

In the case of the population of savers in savings institutions, it 
has been observed that members of this population are not very responsive 
to changes in economic conditions. Thus, during periods of constant 
rate of expansion or contraction in the business cycle, external forces 
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affecting this population may be considered to be constant and a time 
stationary Markov Chain model may be used to study the behavior of the 
population . 

However, during turning points in the business cycle or periods 
of rapid economic changes, external forces may be sufficiently large to 
affect the savings pattern of the savers so that the stationarity assumption 
may no longer hold. Under these circumstances a mere comprehensive 
model which takes into consideration the effects of external conditions 
on the behavior of the savers would be required. The major problem in 
constructing such a model would be in discovering the factors which 
affect the population, measuring the effect of these factors and the effects 
of interaction between various factors. 

The effect of competition between various savings institutions for 
a larger share of the savers' market could not be modeled because of the 
lack of data. However, it is believed that, in the short run, the savers' 
market is in a state of equilibrium and the share of the market captured 
by a savings institution is relatively constant. Thus it can be assumed 
that competition does not affect the savers' behavior to such an extent 
that, not considering its effect, would render any model inadequate. 

C. REVIEW OF MARKOV CHAIN MODELS 

The basic model used in this study was introduced by A. A. Markov 
(1856-1922) around 1907. This model was first applied in economics to 
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the analysis of income and wage distributions by Solow [2 a in 1951. 

The same model was used by Hart and Prais t 2 J in 1956 in a study of 
business concentration. 

The model assumes that a population of entities can be classified 
into a finite number of classes. The population is observed at equi- 
distant time points. The number of entities observed to move from one 
class to another is assumed to be generated by a stochastic process. 

The probability of transition is assumed to depend only on the class the 
entity is in, at the current time interval, and not on where it had pre- 
viously been. This process of change can be completely described by 
a transition matrix, P, as shown below. The p element is the probability 
that an entity currently in the ith class will be found in the jth class 
after one time period. If the stochastic process is time stationary then 
the matrix does not change with time. 
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In most of the research studies using this model the general pro- 
cedure has been to observe some pattern of change over time and, assuming 
that the stochastic process is time stationary, estimate the transition 
probabilities and project the future change. 

Projection of expected number of entities in each class can be com- 
puted as follows: 

let the number of entities in each class at time t be n* , n* , . . . n 1 . 

12 m 

If the transition probabilities are known then the expected number of entities 

moving out of the ith class is p.-n*, p.„n^, . . . p. n!\ 

ll l i 2 1 1m l 

The expected number of entities in each class at time t+1 can be 
found by adding up all the entities that have moved into the class and 
those that did not move out. Thus 
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as: 



In matrix notation the above expressions can be compactly written 



, T t+l . T t 
N = N x P 



where N" = (n^ n^ . . 

t+1 . t+1 t+1 

N = (nj n 2 . , 

P = matrix as defined earlier. 



. n ) , a 1 x m vector 
m 

t+ 1 . . 

. n ) , a 1 x m vector 
m 
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in the above expres- 



t+2 t 1 

N can be computed by replacing N by N L 

sion. This is equivalent to multiplying N l by P x P. The distribution after 

*■ 

k periods can thus be obtained by multiplying N L by P raised to the kth 
power . 

This basic model has two major limitations. First, it assumes that 
the total number of entities in the system is fixed. This assumption has 
been violated frequently in practical applications of this model as changes 
due to entities entering the system, leaving it or losing identity by merging 
are the rule rather than exception. Second, the assumption that the 
stochastic process is time stationary is untenable over long periods. 
Changes in numerous exogenous variables such as wage rates, technology 
and legal requirements are likely to result in changes in the transition 
probabilities . 

Adelman GJ in 1958, overcame the first limitation by introducing 
the concept of a reservoir of potential entrants, from which entrants may 
come and to which exants may go. There was an operational difficulty in 
estimating the size of the population of potential entrants. However, 
Adelman pointed out that the exact size of this population need not be 
known if one was dealing with the proportion of entities in each class 
rather than with the exact number of entities. She therefore assumed 
that the size of the reservoir to be 100,000. The reason given for this 
choice was that it must be large relative to the number of entities in the 
system . 
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Stanton and Kettunen £22^ in 1967 confirmed Adelman's observation 
but went on to demonstrate that: "The number of potential entrants to an 

industry or to a population has a definite and measurable effect on subse- 
quent projections made for that distribution when Markov processes are 
used." Thus, if the number of entities in each class is required, an 
arbitrary choice of the size of the population of entrants will not work. 

Duncan and Lin£9jf in 19 72 proposed that arrivals could be treated 
as a separate stochastic process. The entry of an entity into the system 
is viewed as a two-stage process; first, arriving into the system, then 
entering into a particular class. One could then estimate the parameters 
of the entire process by observing the arrivals, the distribution of arrivals 
among the classes and the transitions between classes separately. He 
denoted the data by Z which was composed of the number of arrivals into 
each class (A) and the number of transitions between each class (X). The 
set of parameters of the process was denoted by 9 = (P,p, 7£ ) where P 
was the transition matrix, p was the multinomial vector of probability of 
an arrival entering a particular class and was the vector of parameters 
of the arrival distribution. The sampling distribution was then written as 
follows: 



f Q (z) = f Q (x,a) = f 0 (x | A=a)f 0 (a) 

= f (x | A=a)f, .(a) 

P 1 (P, Tt ) 

The likelihood function could then be written as 



V®> - L x|A=a (P) - L a (p ' * 1 
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Three reasons were given for the importance of the factorization 



shown above: 

"a. The first factor L , , (P) depends on Z only through 

x l A=a 

the transition counts; 

b. The second factor L (p, 7^ ) depends on Z only on the 
observed entries; and 

c. Likelihood inference is reduced to two distinct and 
simpler problems." 

Anderson and Goodman [2j in 1957 proposed a number of statistical 
tests for the following hypotheses 

a. that the transition probabilities of a first order chain 
are constant; 

b. that in case the transition probabilities are constant, 
they are specified numbers; 

c. that the process is a uth Markov chain against the 
alternative it is rth but not uth order. 

Because of the factorization of the likelihood function Duncan and 
Lin concluded that the methods of Anderson and Goodman are applicable 
to a system with changing number of entities. 

Hallberg £n3 in 19 69 challenged one of the most demanding assump- 
tions of the Markov chain model that the transition probabilities are 
constant regardless of time. Ke proposed to overcome this problem by 
relating transition probabilities to economic variables and to use these 
relations to predict future values of transition probabilities. For some 
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unknown reasons he regressed transition probabilities against the logarithms 
of exogenous variables. Some predicted transition probabilities did not 
fall within the range of zero to one range. He then suggested setting 
negative predictions to zero and to normalize each row of the transition 
matrix by dividing each element by the row sum. 

D . REMARKS 

Despite the limitations of the basic Markov chain model it has been 
successfully used in a variety of situations. The Duncan and Lin approach 
extends the basic model to include arrivals and departures. This can be 
done with little additional effort. To extend the model to cover the possi- 
bility of non -stationary transition probabilities is a considerably more 
difficult task. The first problem is acquiring a data base which is large 
enough to yield precise estimates of transition probabilities. The data 
must also span a long period so that the factors which affect the transition 
probabilities have an opportunity to vary. The second problem is to 
identify these factors and to obtain a functional relationship between 
transition probabilities. The third problem is to predict the future values 
of these factors. The prediction of the non-stationary Markov chain model 
is only as good as the prediction of these factors. The approach suggested 
by Hallberg can be improved by transforming the estimates of transition 
probabilities into logits (the logarithm of the estimates of odds of transition). 
This will ensure that the predicted transition probabilities are between 
zero and one. 
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The basic Markov chain model is used in this paper to model the 



behavior of a population of savers at a savings institution. The Duncan 
and Lin approach is used to treat the phenomena of entries and exits. A 
nonstationary Markov chain model (Model II of this paper) has also been 
developed. The parameters of the models were estimated with data from 
five quarters. The models were then validated with data from the following 
five quarters. The Chi-square Goodness of fit test was used to compare 
the predictive power of the two models. 
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II . MODEL OF A POPULATION OF SAVERS 



A. GENERAL 

The population of savers is first divided into m classes by the 
acount of savings each saver has in his savings account. Each saver is 
free to increase or decrease his savings and to leave the savings insti- 
tution by closing his account. The population is observed periodically. 

A projection of the structure of the population and the amount of savings 
in each class, based on these observations, is desired. A Markov chain 
model can be used for this purpose provided the basic assumptions of 
the model are not violated. 

B. ASSUMPTIONS 

1. The probability that an account moves from class i to class j 
depends only on class i and does not depend on the past history of the 
account. This is obviously not true for an individual account but possibly 
holds for the population of a given class. 

2. Each saver acts independently of other savers. If savers 
act in unison then a Markov model will fail as the assumption of inde- 
pendence is no longer valid. However, the assumption generally holds 
even if savers are affected by the same factors. The transition proba- 
bilities may shift because of these factors but the randomness in action 
of individual savers is still there. 
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3. 



The distribution of the size of accounts within a class is 



independent of the number of accounts that move in or out of that class. 
This assumption is not required for Markov model but is necessary if one 
has to determine the amount of savings from a knowledge of the number 
of accounts in each class. This assumption is generally true if the 
number of accounts in each class is large relative to the net change in 
each period. This assumption can be violated if the number of accounts 
in each class is small and if the class boundaries are wide. 

4. The transition probabilities, arrival rate and the distribution 
of entrants among states are time stationary. This assumption may hold 
during periods of constant expansion or contraction of the business cycle. 
However, it is not expected to hold over long periods and during times 
when external forces change the saving pattern of savers. This assump- 
tion is relaxed in Model II where an attempt was made to discover their 
functional relationship with economic factors and other exogenous 
variables . 

C. DESCRIPTION OF MODEL I 
1 . The Transition Matrix 

Model I has only one stochastic process, the basic Markov 
chain model. The number of arrivals is considered to be constant and the 
proportion of arrivals entering each class is also constant. 

Let m = total number of classes including one class of closed 
accounts 
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time, measured in periods, 0, 1,2 . . . T 



the accumulated number of accounts that have closed 



at time t 



f: 



(f l f 1 - 
K 2 3 



. f* ) 
m 



number of accounts in each active class at t 



c ; 



, t t t . 

c 2 °3 • • • c m 



= number of new accounts entering each active class 



at time t 
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P 12 * 
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P 1 P O'"' P_, rr, 

ml m2 mm 



Let class 1 be the class which contains all the closed accounts. 

It is assumed that an account in the inactive state wili not re-enter the 

active states . Thus p n n = 1.0 and p, . = 0 . 0 , j = 2 . . .m 

11 lj 

The expected number of accounts at time t can be computed from 
the following relationship 9 : 



t-1 

E(e f') = (0 f' )P + (0 c 1 ) Y P. 

j=0 3 

where t = 0, 1 . . . T and P Q = I 
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The first term on the right of the equality sign is the expected 



number of accounts in each class at time t from the original population 
fp . The second term is the expected number of accounts in each class 
derived from those accounts which join the system at each period. Thus, 
the accounts that arrive by period 1 would have undergone i-1 periods of 
transition. Those that arrive by period 2 would have undergone t-2 periods 
of transition. Those arriving at time t would undergo no transition as 




As the stochastic process has been assumed to be time 

stationary the elements of the P matrix are constant and is just the 

th 

single period P matrix raised to the t power. 

The expected total number of accounts in the system at 
time t is just the sum of the elements of T . 

If the size distribution of accounts within each class is con- 
stant over the period of prediction, then the amount of savings in each 
class can be estimated by multiplying the expected number of accounts 
by the average amount of savings in that class . 

2 . The Equilibrium Distribution 

If prevailing conditions were to persist the structure of the 
population will reach an equilibrium in which the number of accounts 
leaving each class is balanced by an equal influx of accounts from the 
other classes. The limiting distribution is given by Oh 

Lim E(e f 1 ) = ( <» c'(I - Q) *) 
n-» *> 



23 



where Q is the sub- matrix of P obtained by removing the column of 

transition probabilities from the classes of active accounts (Class II to 

Class XI) into the class of closed accounts (Class I), and the row of 

transition probabilities of the class of closed accounts. 

The matrix, (I - Q) is often called the fundamental matrix, 
th. 

denoted by M . The ij element of this matrix is the expected number of 

periods that a new account entering class i when it joins the system 

will spend in class j before closing. 

The expected number of periods that a new account entering 

class i when it joins :i . system will remain in the system can be found 
th 

by summing the i row of the fundamental matrix. 

The above results and further treatment can be found in 
Chapter 3 of Ref.£l3j 

3 . Prediction Interval for Single Step Transition 

The predictions made with Model I are point estimates. They 
do not provide any information as to how close they could be to a future 
observation. A prediction interval which gives the range of values that 
a future observation would take say ninety percent of the time would be 
of greater value to a decision maker. 

The number of accounts in each class is the sum of m binomial 
random variables. If the number of accounts in each class is large then 
the binomial random variables can be approximated by normal random 
variables. The sum of normal random variables is another normal random 
variable. Thus a prediction interval can be constructed using this 
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approximation. For one step transition the prediction interval can be 



easily constructed. However, for more than one step transitions the 
task of constructing a prediction interval becomes rather difficult. The 
problem is that after the first transition the number of accounts in each 
class becomes random and the expression for the unconditional variance 
of the number of accounts becomes quite unmanageable. The expressions 
for the variance of the number of accounts in each class, the total number 
of accounts, the amount of savings in each class and the total amount 
of savings for single step transition are listed below. The derivation can 
be found in Appendix A. 

Let n. be the number of accounts in class j at beginning of 



time period a 



p. j be the transition probability of an account from 
class i to j 



N 



a 



be the number of accounts in the system at beginning 



of time period a 

Z. • be the amount of savings in class j at beginning of 



time period a 



Z 



a 



be the total amount of savings in the system at 



beginning of time period a 




m 
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m 



_ . a+1 a+1. V , a 

Cov(nj , n k ) = > - <n, p^p^) 

j/k 



Let z, . be the amount of savings in an account which has moved 
kj 



into class j 



Var(zJ* +1 ) 



n^ p. .Var(z^.) + E^(z, ,)nf p. .(1 - p. .) 



i=2 



ij kj 



'kj i ij ij 



Cov(Z a+1 ,Z a+1 ) 



m_ 

h - (n f p i] p ii )E(z kj )E<z ki ) 
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Var(Z ) 



m 



y Var(Z a+1 ) + 2 \ T Cov(Z a+1 ,Z a+1 ) 

^ P 1=3 ’ 1 
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Using these expressions the prediction intervals for a single 
step transition are as follows: 

90% Prediction Interval 



of number of accounts in class i = f. + 1.64b x (Var(n.)) 



1/2 



m 



of total number of accounts = ^ f. + 1.645 x (Var(N)) 

j=2 J " 



1/2 



1 /2 

of amount of savings in class i = s. + 1.645 x (Var(Z.)) ' 

l — i 



of total amount of savings 



■ £.,±1- 
j=2 J 



645 x (Var(Z)) 



1/2 



where 



expected number of accounts in class i = E(n.) after 
one period 



s. 

i 



expected amount of savings in class i = E(Zj after 



one period 
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N 



total number of accounts in the system after one 



period (random variable) 

n^ = number of accounts in class i after one period (random 
variable) 

Z. - amount of savings in class i after one period (random 
variable) 

Z = total amount of savings in the system after one period 
(random variable) 

The model can be extended to cover the case of stochastic 
arrivals. Assuming the arrival process to be independent of the Markov 
chain process the expression for the number of accounts is the same as 
for the case of non-stochastic arrivals. The only difference is in re- 
placing the vector of entrants (c 1 ) by the product of the expected number 
of arrivals and the multinomial vector of probability of entering each 
active class. Thus, 

c' = E(R) (p p . . . p ) 
z o m 

where R = random number of arrivals 

p. , i = 2 , 3 . . .m = probability of an arrival entering class i 

c’ = vector of entrants into the active classes 

The expressions for variance are changed to take into account 
the variability introduced by the additional stochastic processes. 

Let e a+ ^ be the random number of entrants into class j at time 

• J 

period a+1 
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r a+ ^ be the number of arrivals at time period a+1 

R be the random variable of arrivals 

, a+1 I D a+1. a+1 , . 

Var(e^ | R = r ) = r p (1 - p ) 

Var(e® +1 ) = p (1 - p.)E(R) + p 2 Var(R) 

Since arrivals have been assumed to be independent of the 
accounts in the system 



Var(n a+ 1 ) = Var(e a+ '*') + n a p..(l - p..) 



a+1 



m 



i=2 



i U 



iJ 



Var(N a+ '*') = ^ Var(n 3+ '*') + 2 ^ 

J=2 J j=2 



- 1 m 



j=2 k=3 



^ ( a+1 a+1. 

Cov(n, , n k ) 



. a+1 a+1. 

Cov(n, , n ) = 
J K- 



i - n a P. 

+— i i 



i=2 

jA 



ij P ik 



Let E(Z ) be the expected amount of savings in an account in class j 

be the amount of savings in an account which has just entered 
class j 

4-12 m 

Var(Z a ) = E(Z.) p.(l - p.) + n a p. .Var(z, .) + 

J J J J ^2 i ij kj 

E(z ,) 2 n a p. .(1 - p.) 
kj l ij l 



^ , a+1 r^Q+l \ 

Cov(Z. / Z 1 ) = 



m 



- <n p. jPll )E(z kj )E(z kl ) 



Var(Z a+1 ) = Y Var(Z a+1 ) + 2 5" Y Cov(Z a+1 ,Z a+1 > 

h ' j=2 1+3 J 1 
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D. DESCRIPTION OF MODEL II 
1 . T he Arrival Process 

It was observed that the number of new accounts opened in 
each quarter was between seven hundred and one thousand. For such 
large arrival rates, an assumption that the arrival rate is normally dis- 
tributed would be reasonable. However, it was felt that the arrival 
distribution could be affected by external factors like state of the national 
economy, seasonal effects and level of promotional or advertising activity 
of the savings institution. Thus the following linear econometric model 
was considered: 



Ar 

where 




a 0 +a lV a 2 X 2 ' ' ' a 10 X 10 +e 


Ar 


= 


Number of new accounts opened in each quarter 


x i 


= 


Dummy variable for quarters of the year 


X 2 


= 


California non-agricultural employment 


X 3 


= 


Advertising and promotional expense of the savings 
institution 


X 4 


= 


Prime commercial paper rate, 4-6 months 


X 5 


= 


U. S. Government securities rate, 6 months 


X 6 


= 


Corporation bonds rate 


X 7 


= 


Wholesale price index 


X 8 


= 


U. S. Government securities rate, 3 months 


X 9 


= 


California personal income 


X 10 


= 


U. S. total credit 


e 


= 


Normally distributed random variable with zero 
mean and constant variance 
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The linear model was selected because of its simplicity 



and because of the lack of data required by more complex models. 

2 . The Size Distribution of New Accounts 

The size distribution of new accounts may also change with 
time and external conditions. To model this change, the probabilities 
of new accounts entering each class were related to the same set of 
exogenous variables listed in sub-section 1. Direct application of least 
squares to the probabilities may yield predictions of future values that 
are outside the zero to one range. To overcome this potential area of 
difficulty the estimates of the probabilities were first transformed into 
logits . 

3 . Logit Analysis 

Logit analysis is a special application of Econometrics to 
situations in which the dependent variable has a dichotomous character. 
The object is to estimate the probability of occurrence of a specified 
event given a set of prevailing conditions. For application in this study 
one looks for the probability that a new account enters a particular class 
and the probability that an account will move from one class to another, 
given a set of external conditions. Direct application of least squares 
may result in the prediction of probabilities outside the zero and one 
range. A monotonic transformation can overcome this difficulty. One 
simple transformation is to divide the relative frequency by one minus 
the relative frequency. This quantity is an estimate of the odds of the 
occurrence of the event. This transformation is still restrictive as the 
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new variable can take on only positive values. This problem can be 
overcome by taking the logarithm of this quantity. The logarithm of the 
estimated odds is termed the logit of an event. The model used to predict 
future values of the parameters of the entrants distribution and the tran- 
sition probabilities of the transition matrix was as follows: 

L °9<P/ (1 - Pi» = a 0 +a l X l +a 2 X 2 + • ' • a l0 X 10 + e 

Log(p../(l - Pl) )) = b 0 +b 1 X 1 b 2 X 2 + . . . b 10 X 1Q + e 

There is a further restriction that the sum of the probabilities 
of the entrants distribution must equal one and the row sum of the tran- 
sition matrix should equal one too. The approach taken in this paper 
was to sum up these predicted probabilities and then divide each by the 
sum . 

4 . Transition Matrix of Model II 

The transition matrix of Model II is allowed to change with 

external factors thus the t steps transition matrix is no longer the single 

th 

step matrix raised to the t power but is the product of t matrices. 

5 . Predictions with Model II 

To use Model II the first step would be to obtain predictions 
of future values of those factors that are in the regression equations. 

The parameters of the arrival process, entrance process and the transition 
probabilities are then predicted. The expected number of accounts in 
each class can then be computed by the following expression: 
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where 



t t-1 t-1 

«« t £ ; > = <° y TT v X <0c i> X 

j=0 J k=0 ' k=j 



P. 

J 



E(N t ) 

where 

E(N t ) 

E(zj) 

where 

E(zj) 

E(z.) 

where 

E(Z l ) 



cumulative number of closed accounts 

(f!i f* . . . £* ) = vector of number of accounts in each 
& o m 

active class at time t, t=0, 1 . . . T 
Transition matrix at time j , j=0, 1, ...T 

Transition matrix at time k, k=0 , I, . . .T 

_ /A t w t t t . 

E(Ar )(p 9 p . . . p ) 
z j m 

Vector of expected number of entrants in each active 
class at time t. 




Expected total number of active accounts in the system, 
fj x E(z ) 

Expected total amount of savings in class j 
Average amount of savings in each account in class j 




Expected total amount of savings in the system at 
time t. 
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III. THE DATA AND ESTIMATION OF PAPAMETERS 



A. DESCRIPTION OF DATA BASE 

1 . General 

The data used in this study was obtained from the local branch 
of a savings institution. The population of passbook accounts was selected 
for study as it has greater mobility than other types of savings accounts. 

The quarterly earnings ledgers for 1971, 1972 and the first 
two quarters of 1973 were made available for this study. The quarterly 
earnings ledgers contain the following information which have a bearing 
on this study: 

1. Identification number of each active account. 

2. Amount of savings as of the last day of each quarter. 

3. Amount of earnings for the quarter. 

4. Summary statistic of total number of active accounts. 

5. Summary statistic of total amount of savings. 

6. Summary statistic of total earnings withdrawn. 

7. Summary statistic of total earnings accrued. 

The basic Markov chain model requires the initial distribution 
of the subject population and the transition probability matrix for complete 
specificaion . A preliminary sample of two hundred accounts showed that 
seventy.-two percent of the population would have balances below two 
thousand dollars. A very large random sample would, therefore, be re- 
quired to pick out the behavior of large accounts. It was decided to pick 
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a stratified sample instead. Thus, the sample of accounts examined 
consisted of three blocks of about two hundred each. The first consisted 
of accounts with balances exceeding ten thousand dollars on 31 March 
1971. The second block consisted of accounts between two to ten thous- 
and dollars and the third block consisted of accounts below two thousand 
dollars. The quarterly balance of each account was recorded. To determine 
the initial distribution of the population, the amount of savings of all the 
accounts with balances exceeding one thousand dollars on 31 March 1972 
were recorded. The accounts were sorted by their order of magnitude and 
then divided into ten classes. The class intervals were selected to en- 
sure that the amount of savings in each class was of the same order of 
magnitude. The first eight classes uniformly spanned the interval $1 - 
$15,999. The ninth class contained all accounts between $16,000 - 
$19,999 and the tenth class covered the range from $20,000 - $100,000. 
Accounts exceeding $100,000 were rare; there were six of them in the 
31 March 1972 population. Including them in the largest class could 
result in an unstable mean of the amount of savings in that class; they 
were thus eliminated from the population. It is believed that these large 
accounts are important in the prediction of total acount of savings and 
should, therefore, be treated separately. For the purpose of this study 
the amount of savings for accounts exceeding $100,000 was considered 
to be unchanged over the period of observation. 
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2 . 



Arrival Rate 



The arrival rate was determined by taking the difference 
between the last identification numbers of consecutive quarters. This 
method failed to provide an accurate estimate of the arrival rate for 
Quarter IV-72. It was subsequently learned from the management that 
a block of about two hundred accounts were used to facilitate some 
financial transactions of newly arrived servicemen to Monterey. These 
accounts were subsequently closed. With this information the arrival 
rate for Quarter IV-72 was accordingly reduced. 

3 . The Size Distribution of New Accounts 

The distribution of new accounts was estimated by taking a 
random sample of two hundred and fifty from the population of new accounts 
for each quarter. 

4 . The Validation Sample 

To test if the models with parameters estimated from six 
hundred and twenty-two accounts could predict the behavior of the popu- 
lation, a sample comprising one-fourth of the accounts of Quarter 1-73 
was taken to be used as a base for comparison. A chi square test was 
performed to check if the predicted distribution fits the observation. 

5 . Summary Statistics 

A second check on the predictive power of the model was 
made by comparing the total number of accounts and total amount of 
savings predicted for Quarters 11-72 to 11-73 against the summary statistics 
for these quantities. 
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6 . 



Total Number of Accounts 



It was found that the statistics for total number of active 
accounts included those that had been closed. It appeared that these 
accounts were purged from the records about once a year. As this infor- 
mation would be used as a check on the accuracy of prediction it had to 
be precise, thus, a page count of each quarters' ledger was conducted. 
The information on the total number of accounts and the arrival rate is 
shown in Table I. 



TABLE I 

TIME SERIES OF TOTAL NUMBER OF 
ACCOUNTS AND ARRIVAL RATE 





# OF NEW 


TOTAL # OF 


MARGINAL 


QUARTER 


ACCOUNTS 


ACCOUNTS 


CHANGE 


1-71 


UK 


16895 


UK 


11-71 


754 


17059 


+164 


III— 7 1 


817 


17181 


+ 122 


IV- 71 


599 


17177 


+ 96 


1-72 


778 


17257 


+ 80 


11-72 . 


860 


17354 


+ 97 


Ill— 72 


791 


17483 


+ 129 


IV- 7 2 


798 


17752 


+ 269 


1-73 


998 


18013 


+261 


11-73 


896 


18087 


+ 74 



Nb: UK - Unknown 



7 . Total Amount of Savings 

The trend in the total amount of savings was studied by 
fitting a least squares line through the observations. The data on total 
amount of savings are contained in Table II. 
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TOTAL NUMBER OP SAVERS IN THOUSANDS 




QUARTER 
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TABLE II 



TIME 


SERIES OF TOTAL AMOUNT OF SAVINGS 




QUARTER 


TOTAL AMOUNT 
OF SAVINGS ($M) 


MARGINAL 
CHANGE ($M) 


MEAN 

AMOUNT OF 
SAVINGS ($) 


1-71 


36.8345 


UK 


2180.20 


11-71 


37.5140 


0.6795 


2199.07 


III- 71 


38.8286 


1.3146 


2259.97 


IV- 71 


39.5192 


0.6905 


2300.70 


1-72 


40.5565 


1.0374 


2350.15 


11-72 


41.5743 


1.0177 


2395.66 


III— 7 2 


42.1492 


0.5749 


2410.87 


IV- 7 2 


42.4047 


0.2555 


2388.73 


1-73 


44.1283 


1.7273 


2449.80 


11-73 


44.5614 


0.4431 


2463.73 



The standard deviation of the amount of savings in each 
account was estimated to be $5,314. The standard error of the mean was 
estimated to be $40.54. Using the t test, any two means differing by 
more than $66.86 are considered to be significantly different at the ten 
percent level of significance. Thus the hypothesis that the mean was 
constant over the period of observation was rejected. The average rate 
of increase in the mean was found to be 1. 1158 percent. This increase 
could be partly accounted for by earnings accrued in the accounts. On 
the average, 95.01 percent of the quarterly earnings was retained in the 
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MEAN AMOUNT OP SAVINGS IN THOUSAND DOLLARS 



GRAPH OF MEAN AMOUNT OP SAVINGS VS TIME 




2.15 I ! I 1 1 I I I I L 

1-71 in-71 1-72 III-72 1-73 

11-71 IV— 71 H-72 IV-72 11-73 

QUARTER 
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institution, thus a quarterly increase of 1.219 percent in the mean could 
be expected if there is no change in the structure of the population. 

The following results were obtained by fitting the trend line 
to the total amount of savings: 

(1) Mean of total savings 

(2) Standard deviation 

(3) Constant = a 

(4) Coefficient = b 

(5) Standard error of b 

(6) Coefficient of determination 

(7) Standard error of dependent 
variable 

During the period of observation the total amount of savings 
was increasing at a constant rate of 0.876 million dollars per quarter. 

The annual growth rate based on this would be 8.675%. 

It was found, on the average, that 9 5.01% of earnings was 
left in the accounts each quarter and so the annual growth rate caused 
by new accounts and increases in existing accounts less losses due to 
closing of accounts and reduction in levels of savings would be 8.675% 
-.9501 x 5.13% = 3.801% 

A second regression was performed using the marginal change 
as dependent variable. The following results were obtained: 



= 40.3899 million dollars 
= 2.4153 million dollars 
= 3 6.011 million dollars 
= 0.876 million dollars per quarter 
= 0.039 million dollars per quarter 
= 0.986 

= 0.286 million dollars 
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TOTAL AMOUNT OF SAVINGS IN MILLION DOLLARS 




QUARTER 
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(1) Mean of first differences 

(2) Standard deviation 

(3) Constant = a 

(4) Coefficient = b 

(5) Standard error of b 

(6) Coefficient of determination 

(7) Standard error of dependent 
variable 

It was concluded that 
of total savings in each quarter o^ 



= 0.9117 million dollars per quarter 
= 0.4622 million dollars per quarter 
= 0.823 million dollars per quarter 

2 

= 0.020 million dollars per quarter 

2 

= 0.077 million dollars per quarter 

= 0.011 

= 0.460 million dollars per quarter 

there was no trend in the net change 
er the period of observation. 



B. ESTIMATION OF PARAMETERS 
1 . Model I 

The arrival rate can be estimated by adding up all the new 
accounts opened during the period of observation and dividing by the 
number of time periods. 

The distribution of new accounts can be estimated by taking 
samples from each batch of new accounts, adding up the accounts entering 
each class and dividing by the total number of accounts in the sample. 
Thus: 




where 



A 




maximum likelihood estimate of the probability of a new 
account entering the j c class 
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t , , . . .tn , 

= number of new accounts entering the j class at time t 

r*" = number of accounts in the sample of new accounts at 

time t 

T = number of periods of observation 

The average number of accounts entering each class can be 

found by: 

. C / A A A \ 

c' = Ar(p p . . . p ) 

Z 3 m 

where 

c' = average number of new accounts entering each class at each 
time period 

A 

Ar = Maximum likelihood estimate of the arrival rate 

The stationary transition probabilities can be estimated by 
the following 2 : 




where 

p. . = Maximum likelihood estimate of the probability 

i] 

of transition from class i to class j in any one 
given period 

n. . = Total number of accounts that have moved from 

i] 

class i to class j over the period of observation 
(0 - T) 
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n. 

1. 



“ij w 



n ik (t) 



n.(t-l) 



Total number of accounts that were in class i at 
the beginning of each period 
Number of accounts that moved from class i to 
class j during the period between t-1 and t 
Number of accounts that moved from class i to 
class k during the period between t-1 and t 
Total number of accounts in class i at the time 
period (t-1) 



Anderson and Goodman 2 showed that as n, the total number 

1 /2 a 

of entities in the system, tends to infinity the set (n. ) (p. , - p. .) 

i. ij ij 

has a joint normal distribution with means 0, variances p..(l - p. .) and 

ij U 

covariances - 5 . P. .P , where cT, = 0 if i g and & , , = 1. 

ig ij gh ig n 

This fact can be used to test if certain transition probabilities 

0 

p. . have specified values p.. and if the transition probabilities are indeed 
U U 

stationary . 

2 . Statistical Tests 

The chi square test of goodness of fit can be used to test 
hypotheses concerning transition probabilities. To test the hypothesis 
that p_ = p^ , j = 1,2, . . . m, the quantity. 



m 0.2 

(p. . - P. . 

n. U i] 

J=1 ' -5 



under the null hypothesis has an asymptotic chi square distribution with 



m-1 degrees of freedom.' The null hypothesis is rejected if p_ differs 
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from p. , to such an extent that the above test statistic exceeds the 
ij 

(1 - oc ) percentile of the chi square distribution with rn-1 degrees of 

freedom, where is the level of significance. 

/\ 0 2 

As the variables n. (p.. - p. .) for different i are independent 

i - iJ iJ 

the summation over i is distributed as a chi square distribution with 
m(m - 1) degrees of freedom. 

To test the hypothesis that the transition probabilities are 
stationary over the period of observation the following test statistic can 
be used 2 : 



X = 



m 



X. = 



i=l 



m_ I 

^ 1 /Pij 



where 



n.(t-l) 



A 




total of entities in class i at time t-1 
estimate of the transition probability at time t, 
obtained by counting the number of transitions from 
class i to class j and dividing by n,(t-l) 
estimate of the transition probability from class i to 
class j 



£ "« M/ 




n. (t) 
i 



The asymptotic distribution of this test statistic is chi square 
with m(m-l) (T-1) degrees of freedom. The number of degrees of freedom 
is reduced from m(m-l)T by m(m-l), the number of parameters estimated. 
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The chi square test is based on a statistic which follows a 



chi square distribution when n, the total number of entities in the system, 
tends to infinity. Hence it has been customary of statistics text books 
to recommend that the smallest expected number of entities in each class 
should exceed five or ten. If this requirement is not met in the original 
classification then combination of neighboring classes, until the rule 
is satisfied, is recommended. Cochran 4 challenged this arbitrary 
rule claiming that the power of the test is reduced by pooling classes to 
conform to the rule. He found that for goodness of fit tests of bell shaped 
curves such as the normal distribution there is little disturbance to the 
five percent level when a single expectation is as low as 1/2 . He con- 
tinued stating that the result is also true for the one percent level if the 
number of degrees of freedom exceeds six and that two expectations may 
be as low as one may be allowed with negligible disturbance to the five 
percent level . 

Using Cochran's results, classes with small expectations 
were pooled to ensure that the smallest expected number of entities in 
each class exceeded one and no more than two classes had expected 
numbers less than two. The number of degrees of freedom was reduced 
from m(m-l) (T-l) by the number of classes eliminated. 

3. Model II 

The predictor for arrival rate may be obtained by applying 
the method of least squares to the number of new accounts observed in 
each time period and the corresponding exogenous variables. 
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The distribution of new accounts is estimated in each period 



by dividing the number of new accounts entering each class by the total 
number of accounts in the sample. 

The transition probabilities p_(t) are estimated by dividing 
the number of accounts that moved from class i to class j at time t by 
the number of accounts in class i at time t-1. 

These estimates are maximum likelihood estimates as in 
Model I. They can be transformed into logits and then regressed against 
the set of exogenous variables. 

4 . Estimation of Transition Probabilities 

Each of the six hundred and twenty-two accounts was cate- 
gorized in accordance with the classification given in Section A. 1 . of 
this chapter. The number of accounts in each class for each quarter 
during the period of observation is presented in Table III. The relative 
fraction of accounts, obtained by dividing the number of accounts in each 
class by six hundred and twenty-two, is shown in Table IV. 

It can be seen that twenty-seven percent of the accounts in 
the sample were closed after ten quarters. The proportion of active 
accounts in each class was found by dividing the number of accounts in 
each class by the total number of active accounts. The results are pre- 
sented in Table V. The time series of amount of savings in each class 
is presented in Table VI. 

A chi square test was performed to test if the distribution of 
active accounts had changed during the period of observation. The number 
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TIME SERIES OF DISTRIBUTION OF ACCOUNTS IN THE SAMPLE OF 622 ACCOUNTS 
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TIME SERIES OF AMOUNT OF SAVINGS IN EACH CLASS IN THE SAMPLE 
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of degrees of freedom of the distribution of the chi square statistic is 

eighty-one and the ninetieth percentile of the distribution is 98.01. The 

chi square statistic was found to be 64.2. Thus, the null hypothesis 

that the distribution did not change with time could not be rejected. This 

result was rather surprising as it could imply that the probability of an 

account closing did not depend on the class it was in. 

Each account was examined at each quarter to determine if 

it had made a transition to another class. The transitions were accumu- 

th 

lated in a transition count matrix. The ij element of this matrix is the 

th th 

number of transitions from the i class to the j class in a given quarter. 
An example of a transition count matrix is shown in Table VII. The 
transition count matrices for other quarters are contained in Appendix B. 

The estimate of each quarter's transition matrix was obtained 
by the method described earlier in this section. An example of the estimate 
of the transition matrix of Quarter II 71 is shown in Table VIII. The 
estimates for subsequent quarters are contained in Appendix C. 

A cumulative transition count matrix was formed by adding 
successive transition count matrices. Thus the cumulative transition 
count matrix of Quarter 1-72 is the sum of the transition count matrices 
of Quarters 11-71, III— 71, IV- 71 and 1-72. The cumulative transition 
count matrices are contained in Appendix D. 

The time stationary estimate of the transition matrix was 
obtained by dividing each element of the cumulative transition count 
matrix by its row sum. For the sake of brevity the estimate of transition 
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TRANSITION FREQUENCY MATRIX BETWEEN QUARTER I AND QUARTER 
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SUM 17 L82 92 60 50 24 76 38 29 28 26 622 



ESTIMATE OF TRANSITION MATRIX BETWEEN QUARTER 1 AND QUARTER 
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0400 0.0 0.0 0.0400 0.0400 0.0400 0.0400 0.0 0.0400 0.0400 0.7200 



matrices was termed CPM Z where Z was a Roman numeral indicating that 
the data used in the estimation came from the first Z quarter of the period 
of observation. Thus CPM V stands for the estimate of the stationary 
transition matrix using data from Quarter 1-71 to Quarter 1-72. CPM II 
through CPM X are contained in Appendix E. 

5 . Test of Time Stationary Assumption 

It can be seen from the transition count matrices that there 
are a large number of elements with zero or one transition counts. The 
chi square test could not, therefore, be applied directly. The classes 
of each row were combined so that the smallest class had an expectation 
exceeding one count and no more than two classes had expectation of 
less than two counts. The following grouping was obtained: 



Class 


I 


II 


hi 


IV 


V 


VI 


VII 


VIII 


IX X 


XI 


II 


.046 


.883 


.054 


- 


- 


- 


- 


- 


- 


.017 


III 


.023 


.110 


.733 


. 104 


.015 


- 


- 


- 


- 


.015 


IV 


.040 


.040 


.075 


.711 


.089' 


- 


- 


- 


- 


.046 


V 


.050 


- 


.130 


- 


.672 


.104 


- 


- 


- 


.046 


VI 


.081 


- 


. 147 


- 


- 


.536 


- 


- 


- 


.237 


VII 


.044 


.087 


- 


- 


- 


- 


.760 


.084 


- 


.026 


VIII 


.064 


- 


- 


.109 


- 


- 


- 


.611 


.169 - 


.049 


IX 


.075 


- 


- 


- 


.083 


- 


- 


.083 


.636 - 


.123 


X 


.067 


- 


- 


.102 


- 


- 


- 


- 


.712 


.120 


XI 


.042 


- 


_ 


_ 


.09 7 


- 


- 


- 


- 


.861 



The number of degrees of freedom for the above matrix is 
equal to the number of elements minus the number of linear constraints, 
(47-10). As the number Of matrices is nine and the number of parameters 
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estimated in (47-10) the number of degrees of freedom for the distribution 
of the chi square statistic for the test of stationary transition probability 
matrix is (47-10) (9-1) = 296. 

The rejection region for 10% level of significance is 328.6. 

The chi square statistic was found to be 288.7 thus the null hypothesis 
that the transition probabilities were stationary could not be rejected. 

6 . The Initial Distribution of the Population 

The initial distribution of the population was determined by 
recording all accounts with balance exceeding one thousand dollars on 
31 March 19 72. The number of accounts below one thousand dollars 
was found by taking the difference between the total number of accounts 
and the number of accounts recorded. The mean and variance of the amount 
of savings in an account in each class were estimated from this sample. 
Table IX is a summary of the data obtained. 

It can be seen that the estimate of the mean of each class, 
except for Classes II and XI is close to the midpoint of the respective 
class intervals. All- the means are below the midpoints as there are more 
accounts at the lower end of each class. The estimates of variance of 
Classes II to IX are very close because the class intervals are the same 
and the distribution of accounts in each class has the same general 
shape. The estimates of variance for Classes X and XI show the importance 
of length of class interval on predictions of total amount of savings. 

The variance of the amount of savings of accounts in Classes X and XI 
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TABLE IX 



SIZE DISTRIBUTION OF THE ENTIRE POPULATION 
OF ACCOUNTS AT QUARTER 1-72 



CLASS 


INTERVAL ($) 


NUMBER OF 


MEAN ($) 


VARIANCE 


I 


0 


0 


0 


0 


II 


1 - 1999 


12373 


353 


246544 


III 


2000 - 3999 


1793 


2837 


310372 


IV 


4000 - 5999 


1034 


4916 


317481 


V 


6000 - 7999 


563 


6855 


328649 


VI 


8000 - 9999 


366 


8905 


346948 


VII 


10000 - 11999 


372 


10757 


362291 


VIII 


12000 - 13999 


209 


12920 


329649 


IX 


14000 - 15999 


153 


14961 


314260 


X 


16000 - 19999 


183 


17791 


1355376 


XI 


20000 - 99999 


205 


27888 


110502144 




100000 


6 


156558 


2.983 x 10 9 



can be reduced by the introduction of more classes to cover the same 
interval. However, this could lead to classes having smaller populations 
which may not possess the Markovian property. 

This paper took the compromise in selecting class intervals 
such that each class had a minimum of one hundred and fifty accounts. 

The six accounts that exceeded $100,000 were considered to be unchanged 
during the period of observation. These accounts added up to $0.94 
million. Thus the predicted amount of total savings could differ by one 
million dollars because of the action of a handful of savers. 
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7. 



The Size Distribution of New Accounts 



Each new account of the samples of new accounts was clas- 
sified according to the rule given in Section A. 1. of this chapter. The 
number of new accounts in each class for Quarter 11-71 through Quarter 
11-73 is shown in Table XI. 

The maximum likelihood estimate of the probability of a new 
account entering each class was obtained by dividing the number of new 
accounts in each class by the total number of new accounts. The quarterly 
estimates of the probability of a new account entering each class and the 
time stationary estimates are presented in Table XII. 

A chi square test was performed to test the hypothesis that 
the probabilities were time stationary. The number of degrees of freedom 
of the distribution of the chi square statistic was seventy-two and the 
ninetieth percentile of the distribution is 87.84. The chi square statistic 
obtained was 68.8. Thus the null hypothesis could not be rejected at 
the ten percent level of significance. 

As a further check a one way analysis of variance was per- 



formed. The results are as follows: 

Total number of observations = 22 50 

Average of all observations = 2535.38 

Standard error within groups = 8732.41 

Degrees of freedom = 2241 

Standard error between groups = 11488.08 

Degrees of freedom = 8 
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F statistic 



1.73 



Level of significance = 0.0865 

Thus the null hypothesis that the mean amount of savings 
of new accounts is constant over the period of observation is rejected 
at the 10% level of significance. 

The mean and standard deviation of the amount of savings of 
the samples of new accounts are as follows: 

TABLE X 

MEAN, STANDARD DEVIATION, MEDIAN, MAXIMUM 
VALUE AND MINIMUM VALUE OF SAMPLES OF NEW ACCOUNTS 



Quarter 


Mean 

($) 


Standard 

Deviation 


Median 


Maximum 

Value 


Minimum 

Value 


11-71 


1671.34 


3615.79 


279.5 


25000. 


1 . 


III- 71 


1960.13 


5038.32 


301.5 


52518. 


1 . 


IV-71 


2500.38 


6561.85 


300.0 


50000. 


1 . 


1-72 


2169.10 


5553.17 


224.5 


40000. 


2. 


11-72 


3193.56 


8641.02 


340.50 


103157. 


1 . 


III- 72 


2812 .04 


8264.18 


282.50 


100032. 


1 . 


IV- 7 2 


2271.53 


7642.48 


146.50 


100000. 


1 . 


1-73 


4054.80 


18161.52 


238.5 


200000. 


1 . 


11-73 


2185.53 


6536.75 


101.5 


50000. 


2. 



Nb. sample size = 250 

The Duncan's Multiple Range Test showed that the means of 
Quarters 11-71, III- 71, IV-71, 1-72, IV- 72 and 11-73 are significantly 
different from that of Quarter 1-73 at the ten percent level of significance. 
The means of Quarters 11-71 and 11-72 are also significantly different 
at the ten percent level of significance. The differences between the 
means of other quarters were not considered significant. 
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TIME SERIES OF DISTRIBUTION OF SAMPLE OF NEW ACCOUNTS 
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SUM 250 250 250 250 250 250 250 250 250 225 



TIME SERIES OF ESTIMATE OF DISTRIBUTION CF NEW ACCOUNTS 
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0120 0.0120 0.0320 0.0240 0.0480 0.0320 0.028C 0.0320 0.032C 0.0280 



AMOUNT OF SAVINGS IN EACH CLASS IN THE SAMPLE OF NEW ACCOUNTS 
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The means are greatly influenced by the large accounts. 

The mean of Quarter 1-73 would drop to $3267.87 if the $2 00000 account 
were deleted from the sample. This reduced mean will be significantly 
different from that of Quarter 11-71 only. 

Deleting accounts that were greater than $100000 from the 
samples reduced the means of Quarters 11-72, III— 72 , IV-72 and 1.-73 to 
2792.10, 2421.59, 1879 . 04 and 2292 . 24 respectively . The maximum 
difference between the means is 1120.76 which is considered insignificant 
at the ten percent level of significance. 

8 . Predictors of Transition Probabilities 

The corresponding estimates of transition probabilities of 
each quarter were grouped together, transformed into logits and regressed 
against the following set of exogenous variables: 

X^ = Dummy variable for quarters of the year 

X£ = California non-agricultural employment 

X^ = Advertising and promotional expense of the savings 
institution 

X^ = Prime commercial paper rate, 4-6 months 

X,. = U. S. Government securities rate, 6 months 

X = Corporation bonds rate 

b 

X^ = Wholesale price index lagged by one period 

X_ = U. S. Government securities rate, 3 months 

O 

Xg = California personal income 

X^q = U. S. total credit 

The values of these variables are contained in Table XIV.. 
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TIME SERIES OFEXOGENOUS VARIABLES USED IN THE REGRESSIONS 
11-71 1 1 1-71 IV-71 . 1-72 11-72 III-72 IV-72 1-73 11-73 
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Sources: (1) Federal Reserve Bulletin 

(2) California Economic Indicators , June 1973 



There was some difficulty in transforming the transition 



probabilities as a number of them was equal to zero and the logit of 
zero is minus infinity. The following rule was used to get around this 
problem: 

1. If there are more than two estimates for p_(t), t = 11-71, 

III— 71, . . . 1-73, equal to zero, assume that p_(t) is 
constant over the period of observation and use the time 
stationary estimate obtained for Model I. No regression 
will be performed for these elements. 

2. If there are one or two zeros in the estimates, replace the 
zeros by the time stationary estimate and proceed with logit 
transformation and regression. 

The number of transition probabilities removed by these rules 
was seventy-two. As there were one hundred and ten elements in the 
transition matrix that required estimation, application of these rules left 
a balance of thirty-eight elements for regression. 

The transition matrix for Quarter 11-73 was not included in 
the regression in order that it could be used to test the correctness of 
the predictors obtained with data from earlier periods. Thus, there were 
eight data points in the regressions instead of nine. 

In the first regressions performed, it was found that , 

U. S. Government securities rate, 3 months, , California personal 
income and X^, U. S. total credit were highly correlated with each other 
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and some of the other exogenous variables (R > .98). To reduce the 
problem of multicollinearity , these three variables were dropped from 
the regression equations. 

The following criteria were used to determine if the variance 
of the logits of transition could be explained by the exogenous variables: 

1. The F statistic obtained by the ratio of the estimate of the 
variance before and after the introduction of an independent 
variable must exceed 2.06, the eightieth percentile of the 
F(7,6) distribution. 

2 

2. The coefficient of determination, R , must exceed 0.70. 

Of the thirty-eight regressions only ten were found to be 

significant according to these criteria. As each row of the transition 
matrix would be divided by the sum of its elements these ten elements 
could cause significant changes to the transition matrix. 

The predictors for the ten logits of transition, obtained by 
regression, are as follows: 
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7 9 


-5.784 


+ 0.143X 

(0.073) 1 


+ 0 . 14 OX 

(0.052) 


10 11 


7.557 


0.382X 
(0.080) 1 


0.083X 

(0.056) 


11 10 


-6.469 


+ 0.758X 

(0.087) 5 





+ 0.130X 

(0.094) 
7.066X 
(2.275) ' 



These logits were then transformed back into probabilities by 
taking the anti-logarithms and dividing by one plus the anti-logarithms 
of the logits. Thus, 

p„ = exp(L_)/(l + exp(L. .)) 

The frequency of appearance of each exogenous variable is 

as follows: 



VARIABLE FREQUENCY 

1 6 

2 0 

3 4 

4 2 

5 5 

6 0 

7 4 

The estimates of transition probabilities that were found to 
vary significantly with the set of exogenous variables appeared to have 
a seasonal effect as the dummy variable appeared most frequently in the 
regressions . 

An increase in X , U. S. Government securities rate, would 

D 

result in an increase in the probability of an account to move from 



Nb. The number in brackets below each regression coefficient is the 
standard error of the coefficient. 
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Class XI to Class X. A possible explanation is that savers in Class XI 
will reduce their passbook account savings and invest in U. S. Government 
securities when the securities rate increases. However, a consistent set 
of explanations could not be given for the ten predictors so a non-casual 
approach had to be followed. 

The transition probabilities without predictors were considered 
to be stationary during the period of observation. Thus the nonstationary 
matrix was formed by replacing ten elements of the estimate of the stationary 
matrix with predicted values. To ensure that each row add up to one, each 
element was divided by the rwo sum. Selected transition matrices used 
in Model II are contained in Appendix F. 

A chi square test was performed to test if the predictors could 
predict the transition matrix for Quarter 11-73. The predicted matrix was 
formed by replacing ten elements of the Quarter 1-73 cumulative matrix 
with values obtained with the predictors and normalizing each row. The 
problem of small expected number of transitions in certain elements of the 
matrix was resolved by combininb classes of each row in the manner 
described in Section A. 5. The ninetieth percentile for the chi square 
distribution with 37 degrees of freedom is 48.84. The chi square statistic 
obtained in the test was 35.25, thus, the null hypothesis, that the pre- 
dicted matrix and the observed matrix of Quarter 11-73 were the same, 
could not be rejected. 

9 Predictors of Arrival Rate 

The number of new accounts opened in each quarter was 
regressed against the same set of exogenous variables listed in sub-section 
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8 . 



The predictor of arrival rate, measured in thousands per quarter, 



was found to be as follows: 

Ar = 0.052 - 0.073X + 0.094X 
(0.017) 1 (0.029) 5 

The standard error of each coefficient is contained in the 
bracket below each coefficient. The square of the multiple correlation 
between the arrival rate and the exogenous variables, X and X , was 

J. 0 

0.846. The standard error of Ar before and after the regression was 
0.7887 and 0.045. 



According to this predictor, the number of new accounts 
opened per quarter decreases as the year progresses, as X^, the dummy 
variable for quarters, takes on values 1,2,3 and 4 for the four quarters 
of the year. The number of new accounts opened would also increase as 
the U. S. Government securities rate increases. No apparent reasons 
could be found for this relationship. Predictions are compared with 
observations in the following table. 



TABLE XV 

PREDICTED ARRIVAL RATE AND ACTUAL RATE OBSERVED 



QUARTER 


PREDICTION 


OBSERVE 


11-72 


777 


860 


III- 72 


751 


791 


IV- 7 2 


719 


798 


1-73 


1015 


998 


11-73 


1017 


896 
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10 . 



Predictors of the Probabilities of a New Account Entering 
Each Class 



The estimates of the probability of a new account entering 
each class obtained for Quarter 11-71 through Quarter 11-73 were collected 
together. They were transformed into logits and regressed against the 
set of exogenous variables listed in sub-section 8. Using the criteria 
given in sub-section 8 to determine if the exogenous variables in a re- 
gression could explain the variance of the logits, only four predictors 



were accepted. They are: 



'10 



2.217 - 0.082X 
(0.029) 1 


- 0.466X, 


(0.180)' 


6.187 - 0.089X 


- 0.979X 


(0.029) 


(0.246) 


-4.482 - 0.184X 


+ 3.053X 


(0.049 ) ^ 


(1.073) 


-10.725 + 0.184X 


+ 0.998X 


(0.094) 5 


(0.332) 



The standard error of each coefficient is contained in the bracket below 



each coefficient. 

The logits are transformed back to estimates of probabilities 



by: 

10 ( V/ (1.0 + 10 ( V) 

Logarithms to the base of 10 were used in both the forward 
transformation and the inverse transformation. The base of the logarithm 
does not affect the results of the regressions. 

Predictions of the number of new accounts in each class 
were checked by means of the chi square test. The number of degrees 
of freedom of the distribution was thirty and the ninetieth percentile of 
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the distribution is 40.26. The chi square statistic obtained was 36.87. 



Thus, the hypothesis that the predicted distributions matched the obser- 
vations could not be rejected. 

The predicted arrival distributions for Quarters 11-72 to 11-73 
are contained in Appendix G. 
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IV. MODEL VALIDATION 



A. VALIDATION OF MODEL I 

1 . Prediction of Sample Population Behavior 

As there were no entries into the sample population changes 
to the structure were caused by accounts moving between classes and by 
accounts closing. Thus the basic Markov chain model could be used to 
model the behavior of this population. 

It was decided to use the data from the five quarters. Quarter 
1-71 through Quarter 1-72, to estimate the time stationary transition matrix 
and then use the matrix to predict the structure of the sample population 
for Quarter 11-72 through Quarter 11-73. Predictions could then be com- 
pared against observations and the chi-square test be used to determine 
the goodness of fit. 

CPM V, the estimate of the time stationary transition matrix 
with the first five quarters' data, was used to predict the number of accounts 
in each class and the amount of savings in each class. The results of the 
predictions on the number of accounts is contained in Table XVI . The 
actual number observed and the chi-square statistic for each class are 
presented next to the predictions. 

The predictions were expected to diverge more and more from 
observations as time progressed as errors would accumulate. The chi- 
square statistic for the first prediction was 3.49 and the value for the 
fifth prediction was 11.91. These correspond to the fourth percentile and 
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the seventieth percentile of the chi-square distribution with ten degrees 
of freedom. The predicted distribution after five quarters still provided 
a reasonably good fit to the observations. 

The predicted amount of savings in each class and the actual 
amount observed are presented in Table XVII. The predictions did not 
match the observations as well as the predictions of number of accounts. 
The error in prediction of total amount of savings amounted to 10.6 percent 
after five quarters. The difference between predicted total amount of 
savings and the amount observed could be explained by the fact that the 
predicted number of accounts for the larger classes, class VII to class XI, 
were generally smaller than the number observed. The error in the number 
of accounts, though relatively insignificant in absolute magnitude, when 
multiplied by the average amount of savings would amount to a substantial 
sum. Thus the estimates of transition probabilities between classes with 
low average amount of savings per account and those with high average 
amount of savings per account would have to be precise to yield more 
accurate predictions of total amount of savings. 

A relatively small number of large accounts can increase the 
variability of total amount of savings significantly. The error in prediction 
for Quarter 11-73 amounted to about four hundred and fifty six thousand 
dollars. Of this amount four hundred and forty two thousand dollars were 
contributed by twenty two accounts in classes VIII, IX, X and XI. It 
would seem to appear that there is no easy way to reduce the variability 
in total amount of savings caused by this small group of savers. 
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TABLE OF PREDICTED NUMBER OF ACCOUNTS, ACTUAL NUMBER OBSERVED AND CHI-SQUARE STATISTICS 
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If the time stationary assumption is not violated then it is 



legitimate to estimate the transition matrix with data from the entire period 
of observation. The increase in data should yield better estimates of 
transition probabilities. Thus CPM X, the transition matrix estimated 
with all ten quarters' data, was used in predicting the number of accounts 
and the amount of savings in each class. The results are presented in 
Appendix H. 

To demonstrate the importance of data on predictions, CPM II, 
the transition matrix estimated with data from Quarter 1-71 and Quarter 
11-71, was also used to predict the number of accounts and the amount of 
savings in each class. The results are also presented in Appendix H. 

The chi-square statistics obtained using CPM V, CPM II 
and CPM X are compared in the following table: 

TABLE XVIII 

COMPARISON OF CHI SQUARE STATISTICS OBTAINED 
WITH CPM V, CPM II AND CPM X 



MATRIX 


11-72 


III— 72 


QUARTER 

IV-72 


1-73 


11-73 


CPM V 


3.49 


2.45 


11.05 


10.74 


11.91 


CPM II 


7.59 


13.84 


35.67 


51.11 


65.97 


CPM X 


3.26 


1.12 


5.93 


5.03 


3.57 



The tenth percentile and the ninetieth percentile of the chi 
square distribution with ten degrees of freedom are as follows: 
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Using P as a criterion to determine if the fit is acceptable 

y u 

it could be seen that predictions with CPM V and CPM X passed the test 
for the entire period of prediction whereas predictions with CPM II were 
only acceptable for the first two periods . 

The total amount of savings predicted using CPM V, CPM II 
and CPM X are compared in the following table: 

TABLE XIX 

COMPARISON OF TOTAL AMOUNT OF SAVINGS 
OBTAINED WITH CPM V, CPM II AND CPM X ($M) 

QUARTER 



MATRIX 


11-72 


III— 7 2 


IV-72 


1-73 


11-73 


CPM V 


3.672 


3.509 


3.351 


3.201 


3.057 


CPM II 


3.553 


3.293 


3.060 


2.851 


2.664 


CPM X 


3.727 


3.613 


3.500 


3.389 


3.280 


ACTUAL 


3.627 


3.535 


3.509 


3.404 


3.418 



The superiority of predictions with CPM X is apparent. The 
percentage error in predicting the total amount of savings of Quarter 11-73 
is 4.0 which is less than half of that obtained using CPM V. The importance 
of accurate estimates of transition probabilities is clearly demonstrated 
by the above comparisons. 

2 . Prediction of Behavior of Population 

To predict the behavior of the entire population the model has 
to include the process of arrivals and entrants. As the sample size was 
small (about 3.5% of the population) it was decided to use the entire data 
base to estimate the transition matrix. 
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The average arrival rate (number of new accounts opened per 
quarter) was found to be 800.7 and the distribution of new accounts was 
estimated to be as follows: 

CLASS p. 

J 

II 0.7813 

III 0.0680 

IV 0.0484 

V 0.0187 

VI 0.0124 

VII 0.0156 

VIII 0.0089 

IX 0.0111 

X 0.0076 

XI 0.0280 

The estimates were obtained by adding up the number of new accounts in 
each class over the period of observation and dividing by the total number 
of new accounts sampled. 

The number of accounts in each class was predicted by 
adding the expected number of accounts moving into or remaining in that 
class from the population of accounts already in the system and the 
number of new accounts entering that class. The expression used in the 
computation can be found in Section C of Chapter II. 

The predicted total number of accounts and the total amount 
of savings are shown in the following table: 



78 



TABLE XX 



PREDICTED TOTAL NUMBER OF ACCOUNTS AND 
TOTAL AMOUNT OF SAVINGS AND OBSERVED VALUES 





QUARTER 


11-72 


III— 72 


IV- 72 


1-73 


11-73 


TOTAL # 


PRED. 


17345 


17447 


17557 


17664 


17776 


OF 














ACCOUNTS 


ACT. 


17354 


17483 


17485 


17746 


17820 


TOTAL 


PRED. 


45.65 


49.87 


53.78 


57.39 


60.74 


AMOUNT OF 












SAVINGS 


ACT. 


41.57 


42.15 


42.40 


44.13 


44.56 



The maximum error in predicting the total number of accounts 
was 82 which was about half a percent of the total number of accounts. 

This indicated that the process of arrivals and the process of departures 
were probably as described by the model during the period of prediction. 

The failure of the model to predict the total amount of savings 
could be due to the failure of the model to predict the structure of the popu- 
lation or a violation of the constant average amount of savings in each 
class assumption. 

To test the hypothesis that the error in total amount of savings 
was caused by error in predicting the number of accounts in each class, a 
sample comprising one-fourth of the population at Quarter 1-73 was taken 
and used to compare with the predicted structure of active accounts. The 
chi square test was used to determine the goodness of fit between the 
predicted distribution and the distribution of the sample. 

The number of degrees of freedom of the distribution of the 
chi square statistic is eight and the ninetieth percentile of the distribution 
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is 13.36. The chi square statistic obtained was 111.0, thus, the null 
hypothesis that the predicted distribution and the distribution of the sample 
could be rejected. 

In examining the chi square statistic of each class it was 
found that major sources of error came from Classes II and III, IV, V, VII 
and XI (Classes II and III had been combined to ease the burden of ex- 
tracting data for the validation sample). It appeared that Classes IV, V, 

VII and IX became much larger at the expense of Classes II and III. This 
would account for the high predictions of total amount of savings. 

Another check was made by taking the difference between the 
predicted number of accounts in the sample and the actual number of 
accounts in each class and multiplying by the respective average amount 
of savings of each class. The errors in the amount of savings in each 
class are shown in Table XXI. 

If the validation sample could be taken as a good represen- 
tation of the population then the error in prediction of the population could 
be estimated by multiplying the error in the amount of savings in the vali- 
dation sample by four. Thus, the prediction of total amount of savings 
would be high by $11.2 million. The observed error of $13.3 million 
could therefore be considered to be mainly the result in errors in predicting 
the structure of the population. 

Looking at the error in the prediction of amount of savings of 
each class, it can be seen that Class XI is a major contributor to the total 
error. It was suspected that the model failed because of sampling errors 
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TABLE XXI 



ERRORS IN PREDICTING THE AMOUNT OF SAVINGS 
IN THE VALIDATION SAMPLE 



CLASS 


PREDICTED 
# OF A/C 


ACTUAL # 
OF A/C 


ERROR IN 
# OF A/C 


ERROR IN 
AMOUNT OF 
SAVINGS 


II & III 


3435 


3699 


-264 


-182759 


IV 


342 


222 


+ 120 


+589920 


V 


180 


144 


+ 36 


+246780 


VI 


95 


103 


- 8 


- 71240 


VII 


138 


93 


+ 45 


+484065 


VIII 


72 


60 


+ 12 


+155040 


IX 


52 


41 


+ 11 


+164571 


X 


48 


50 


- 2 


- 35582 


XI 


122 


70 


+ 52 


+1450175 


TOTAL 


4484* 


4482 


+ 2** 


+2800971 



* Should equal 4482. Discrepancy caused by rounding error 
** Should equal 0. Discrepancy caused by rounding error 



which resulted in estimating higher probabilities of transition between 
classes with low average amount of savings and those with large average 
amount of savings. • 

To check out this hypothesis the following changes were made 



to CPM X: 



1. Accounts found to have made two or more transitions 
between Classes II, III, IV and V and Classes VIII, IX, X and XI were 
removed from the data base as these accounts would not be representative 
of the normal behavior of the population. Eight accounts were rejected 
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according to this rule and CPM X was recomputed with the remaining six 
hundred and fourteen accounts. This modified transition matrix was termed 
MOD I. 

2. The 90% lower confidence limit was estimated for trans- 
ition probabilities from Classes II, III, IV and V to higher classes. The 
Poisson distribution was used to approximate the binomial distribution 

in cases when the total number of transitions observed was below seven. 
The normal approximation was used when the number of transitions observed 
exceeded seven. This modification was applied to MOD I and termed 
MOD II. 

3. Further adjustments were made to a few transition proba- 
bilities based on the results of the chi square fit using MOD I and MOD II. 
The rationale for the adjustments is as follows: 

Since the data base of accounts is inadequate for estimation 
of population parameters, use the additional data available from the 
validation sample to correct the estimation of certain parameters. Hypothe- 
size that the new matrix, termed MOD III, as the best estimate and proceed 
with the prediction of total number of accounts and total amount of savings 
in the institution. A good fit between predicted total amount of savings 
over the prediction interval would give support to the hypothesis. 

MOD I, MOD II and MOD III are contained in Appendix E. 

The results obtained using the modified matrices are compared 
against predictions using CPM X in Tables XXII and XXIII. 
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COMPARISON OF PREDICTIONS OF SIZE DISTRIBUTION OF VALIDATION 
SAMPLE BY MODEL I, USING CPM X, MOD I, MOD II AND MOD III 
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It can be seen from Table XXII that the structure of the pre- 
dicted distribution changed substantially with each modification. The 
improvement in fit in the predicted distribution with each modification 
had a corresponding effect in the prediction of total amount of savings. 
However, the predicted total number of accounts were marginally degraded 
by each modification. The changes, however, were not considered to be 
significant as the percentage error was still of the order of less than one 
percent. 



Though the modifications to the transition matrix improved the 
predictions they do not prove that the true transition matrix should be as 
specified by MOD III. However, with the amount of information available 
the best estimate of the transition matrix is MOD III. Although its ability 
to predict the structure of the population has not been put to a test, the 
accurate prediction of total amount of savings encourages one to believe 
that MOD III is close to the true matrix. 

TABLE XXIII 

MODEL I PREDICTIONS OF TOTAL NUMBER OF ACCOUNTS AND AMOUNT 
OF SAVINGS ($M) USING CPM X, MOD I, MOD II AND MOD III 





QUARTER 


11-72 


III— 7 2 


IV- 72 


1-73 


11-73 


TOTAL 


CPM X 


17345 


17447 


17554 


17664 


17776 


NUMBER 

OF 


MOD I 


17336 


17428 


17525 


17625 


17726 


ACCOUNTS 


MOD II 


17335 


17424 


17516 


17609 


17702 




MOD III 


17329 


17405 


17408 


17552 


17622 




ACTUAL 


17354 


17483 


17485 


17746 


17820 


TOTAL 


CPM X 


45.65 


49.87 


53.78 


57.39 


60.74 


AMOUNT 


MOD I 


44.64 


47.97 


51.06 


53.97 


56.67 


OF 


MOD II 


43.00 


44.80 


46.43 


47.96 


49.38 


SAVINGS 


MOD III 


41.94 


42.74 


43.48 


44.16 


44.79 




ACTUAL 


41.57 


42.15 


42.40 


44.13 


44.56 
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3. 



Estimates of the Fundamental Matrix 



The fundamental matrix (I - Q) * was estimated by substituting 

Q from CPM X into the expression. It is displayed in Table XXIV. 
th 

The ij element of this matrix is the expected number of time 
periods that a new account beginning in Class i will spend in Class j 
before closing. Thus a new account joining, say. Class IV will on the 
average visit Class V for 2.4562 periods during its entire life in the system. 

The expected total time a new account which joins Class i 
spends in the system is the sum of the ith row of the fundamental matrix, 

M. 

The equilibrium distribution is obtained by multiplying the 
distribution of arrivals by M. The results obtained are presented in Table 
XXVI. Results obtained using MOD III are also presented. 

The results are interesting in that they are predictions of the 
final state of the population if current conditions were to prevail. This 
state of equilibrium is reached when the number of new accounts opened 
per quarter balances the number of accounts closed, and the number of 
accounts moving out of each class is balanced by a corresponding number 
of accounts moving in from other classes. The Fundamental matrix obtained 
with CPM X predicts that the population will grow from 172 51, at Quarter 
1-72, to a final value of 21734. The population of each class grows 
larger except for Class II. However, as noted earlier, CPM X did not 
predict The total amount of savings accurately; therefore, projection of 
the equilibrium distribution using it has little value except to constrast 
with the results obtained with MOD III. 
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THE FUNDAMENTAL MATRIX OBTAINED WITH CPM 
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THE FUNDAMENTAL MATRIX OBTAINED WITH MOD III 
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ESTIMATES OF EQUILIBRIUM DISTRIBUTION AND AMOUNT OF SAVINGS 
IN EACH CLASS AND OBSERVED VALUES AT QUARTER 1-72 
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The Fundamental matrix obtained with MOD III produced rather 
believable kind of predictions. It predicted that the total number of 
accounts will grow to a maximum of 19363 and each class grows larger 
at the same time. The equilibrium amount of savings in the population 
will be $53.74 million. Thus, if current conditions will prevail the insti- 
tution can expect a growth of another $10 million, from the current level 
of $44 million (as at 30 June 19 73), in the passbook accounts. 

The population under consideration, however, did not include 
accounts greater than $100,000. A separate study will therefore be required 
to predict the equilibrium number of accounts in this group of accounts 
which numbered six, at Quarter 1-72 . 

The expected length of stay of accounts in the system are 
presented in the following table: 

TABLE XXVII 

EXPECTED LENGTH OF STAY IN THE SYSTEM 
COMPUTED WITH CPM X AND MOD III 

CLASS LENGTH OF STAY IN SYSTEM 

(QUARTERS) 





CPM X 


MOD III 


II 


26 


23 


III 


29 


27 


IV 


29 


26 


V 


29 


27 


VI 


29 


27 


VII 


29 


27 


VIII 


31 


29 


IX 


31 


29 


X 


32 


30 


XI 


33 


31 



89 



The expected length of stay in the system is almost constant 
for all the classes except for Classes II and XI. The conclusion that can 
be drawn from this observation is that the length of stay of a saver, in 
the system, is relatively indifferent to the amount of savings he started 
out with. The shorter life of accounts in Class II is a fact that has been 
noticed previously. The longer life of accounts in Class XI is contrary to 
expectation, as one would expect savers who do not have immediate need 
for such large sums, to transfer the passbook account into other types of 
savings account which yield higher earnings. The observation may be 
explained if these savers do not close their account when funds are trans- 
ferred to other types of accounts. The length of stay would then reflect 
the length of time a saver wishes to remain a customer of the savings 
institution. The Fundamental matrix using CPM X predicts, on the average, 
lengths of stay of 29.8 periods whereas the Fundamental matrix using 
MOD III predicts 2 7.6 periods. The smaller total number of accounts 
predicted using MOD III can be explained by the fact that customers spend 
less time in the system. 

Thus, the model shows that efforts to keep customers in the 
system are as important as attracting new customers into the system. 

B. VALIDATION OF MODEL II 

1 . Prediction of Sample Population Behavior 

The transition matrices used in predicting the behavior of the 
sample were estimated by the method described in Chapter II, Section B. 8. 
The elements of the transition matrices that did not have predictors were 
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taken from CPM V, the estimate of the time stationary transition matrix 
using data from the first five quarters. The predicted matrices are contained 
in Appendix F. The predicted number of accounts in each class was com- 
pared against the actual number observed. The chi square test was used 
to determine the goodness of fit between the predicted and observed dis- 
tribution of accounts in the sample. 

The results are presented in Appendix I. It was found that 
the predictions matched the observations very closely for the first four 
quarters. The chi square statistic of each of the first four quarters was 
less than 6.7. However, the predictions for the fifth quarter were extremely 
poor. The chi square statistic was 2 5.02. If the null hypothesis that the 
predicted and observed distributions are the same were true, then this 
chi square statistic would be obtained 0.5 percent of the time. The null 
hypothesis could thus be safely rejected at the 10% level of significance. 

An investigation of the causes of the failure of the model to 
predict accurately for Quarter 11-73 showed that the ten predictions of 
transition probabilities for Quarter 11-73 had altered the transition matrix 
for Quarter 11-73 substantially. Two exogenous variables , prime com- 
mercial paper rate, 4-6 months and X , U. S. Government securites rate, 

0 

6 months, were considerably higher in Quarter 11-73 than in the earlier 
quarters. Thus the predictors were used beyond the data base from which 
they were derived. This could lead to unexpected results. 

To verify the hypothesis that Model II failed in Quarter 11-73 
because of the use of some predictors beyond the data base on which they 
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were derived, predictions were repeated using a matrix with predictors that 



had X as an explanatory variable removed. The chi square statistic obtained 
o 

with this modified matrix was 14.87, a substantial improvement from that 
obtained without the modification. The ninetieth percentile of the chi 
square distribution with ten degrees of freedom is 15.99. Thus the null 
hypothesis could not be rejected at the 10% level of significance. It was 
therefore concluded that hypothesis on the failure of the model is correct. 

2 . Prediction of Population Behavior 

The complete Model II was used in the prediction of the behavior 
of the population. The predicted number of new accounts opened in each 
quarter was computed in Chapter III, Section B. 9. The predicted number 
of new accounts entering each class was presented in Chapter III, Section 
B. 10. The transition matrix used was the same as that used in the pre- 
diction of sample population behavior in sub-section 1. 

With experience gained in earlier predictions with Model I, 
high predicted total amount of savings was expected. The modifications 
applied to the transition matrix of Model I were also applied to Model II. 

The predicted total number of accounts and total amount of savings are 
presented in Table XXVIII. 

The total number of accounts predicted by Model II matched 
the observed values closely for Quarters 11-72, III— 72 and IV-72, but 
diverged quite widely by Quarter 11-73. The predicted total amount of 
savings was high but the divergence increased substantially in Quarter 

11-73. 
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TABLE XXVIII 



PREDICTED TOTAL NUMBER OF ACCOUNTS AND TOTAL AMOUNT OF SAVINGS 
($M) BY MODEL II, USING CPM X, MOD I, MOD II AND MOD III 





QUARTER 


11-72 


III— 72 


IV- 72 


1-73 


11-73 


TOTAL 


CPM X 


17307 


17380 


17448 


17985 


18547 


NTTMRFR OF 




MOD I 


17305 


17374 


17438 


17973 


18534 


AGCO UN To 




MOD II 


17304 


17370 


17430 


17966 


18531 




MOD III 


17304 


17364 


17414 


17953 


18526 




ACTUAL 


17354 


17483 


17485 


17746 


17820 


TOTAL 


CPM X 


45.61 


49.79 


53.68 


58.20 


62.49 


AMOUNT OF 


SAVINGS 


MOD I 


44.59 


47.88 


50.98 


54.80 


58.47 


(MILLION 


MOD II 


42.95 


44.69 


46.32 


48.75 


51.13 


DOLLARS) 


MOD III 


41.90 


42.64 


43.31 


44.80 


46.19 




ACTUAL 


41.57 


42 . 15 


42.40 


44.13 


44.56 



The hypothesis, that the model failed to yield accurate pre- 
dictions because the predictors of transition probabilities were used beyond 
the range of data used to obtain the predictors, was put to another test by 
predicting with a transition matrix that had predictors with X as explana- 

D 

tory variable removed. The predictions are presented in Table XXEX. 

It can be seen that the predicted total number of accounts has 
improved considerably by this change to the transition matrices. The 
improvement to predictions of total amount of savings is not so pronounced. 

The validation sample of 4483 accounts taken from the Quarter 
1-73 population was used to check if Model II predicted the population 
structure accurately. The predictions obtained with CPM X, MOD I, MOD 
II and MOD III are presented in Table XXX Predictions by Model II' are 
presented in Table XXXI. ' , 
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TABLE XXIX 

PREDICTED TOTAL NUMBER OF ACCOUNTS AND TOTAL 
AMOUNT OF SAVINGS BY MODEL II' 





QUARTER 


11-72 


III— 72 


IV- 72 


1-73 


11-73 


TOTAL 


CPM X 


17320 


17373 


17404 


17738 


18068 


NUMBER OF 
ACCOUNTS 


MOD I 


17310 


17354 


17375 


17697 


18016 




MOD II 


17310 


17350 


17365 


17681 


17992 




MOD III 


17309 


17343 


17346 


17645 


17937 




ACTUAL 


17354 


17483 


17485 


17746 


17820 


TOTAL 


CPM X 


45.62 


49.69 


53.34 


57.44 


61.23 


AMOUNT OF 
SAVINGS 


MOD I 


44.60 


47.76 


50.62 


54 . 01 


57.14 


(MILLION 


MOD II 


42.96 


44.58 


46.00 


48.04 


49.92 


DOLLARS) 


MOD III 


41.90 


42.55 


43.06 


44.28 


45.37 




ACTUAL 


41.57 


42 . 15 


42.40 


44.13 


44.56 



It can be seen that the predicted distribution improved with 
each modification. The error in predicting the total amount of savings 
can be attributed to the error in the prediction of number of accounts in 
each class. As an example, the error in predicting the number of accounts 
in Classes XI, VII and IV could account for $2.9 million in the prediction 
of total amount of savings for Quarter 1-73 using MOD II. 

Though the predicted distribution using MOD III fitted the 
observed distribution very closely, the error in predicting the number of 
accounts in Class XI could account for $0.67 million of the error in pre- 
dicting the total amount of savings for the entire population. This again 
demonstrates the importance of accurate predictions of number of accounts 
in classes with large average amount of savings. 
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PREDICTED DISTRIBUTION OF ACCOUNTS IN THE VALIDATION SAMPLE, 
OBSERVED DISTRIBUTION AND CHI SQUARE STATISTICS BY MODEL II 
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PREDICTED DISTRIBUTION OF ACCOUNTS IN THE VALIDATION SAMPLE, 
OBSERVED DISTRIBUTION AND CHI SQUARE STATISTICS BY MODEL II' 
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C. COMPARISON OF MODEL I AND MODEL II 
1 . Sample Population Behavior 

The chi square statistics obtained in the test of goodness of 
fit between the predicted distributions and the observed distribution were 
used as a measure of the predictive power of the two models. 

Model II' denotes Model II modified by the deletion of five 
predictors of transition probabilities which had X as an explanatory 

D 

variable. The chi square statistics obtained with Model I, Model II and 
Model II' are presented in Table XXXII. 

TABLE XXXII 

COMPARISON OF CHI SQUARE STATISTICS 
OBTAINED WITH MODELS I, II AND II' 

QUARTER 



CPM 


MODEL 


11-72 


III— 72 


IV- 7 2 


1-73 


11-73 


V 


I 


3.49 


2.45 


11.05 


10.74 


11.91 


V 


II 


3.60 


1.98 


6.70 


4.35 


25.02 


V 


II' 


3.47 


2 . 19 


8.69 


8.84 


14.87 


II 


I 


7.59 


13.84 


35.67 


51.11 


65.97 


II 


II 


- 6.76 


11.46 


24.65 


33.01 


76.64 


II 


II' 


6.99 


12.83 


30.73 


43.05 


68.39 


X 


I 


3.26 


1.12 


5.93 


5.03 


3.57 


X 


II 


2.97 


0.94 


3.82 


1.26 


15.92 


X 


II' 


3.17 


1.05 


4.71 


3.99 


7.01 



Except for Quarter 11-73, Model II was generally superior to 
Model I. Model II' improved the predictions for Quarter 11-73 but did 
not perform as well as Model II for the other quarters. The results were 
expected as Model II, having greater flexibility, should perform better 
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under normal situations. Model II 1 , with only five predicted elements in 



its transition matrix, would be expected to be less responsive to changes 
in external conditions, thus would not perform as well as Model II . Model 
I, being completely indifferent to external conditions, should be expected 
to be the poorest performer among the three models. 

The predicted total amount of savings predicted by Models I, 

II and II' are presented in Table XXXIII. 

TABLE XXXIII 



COMPARISON OF PREDICTED TOTAL AMOUNT OF 
SAVINGS ($M) BY MODELS I, II AND II’ 



CPM 


MODEL 


11-72 


III— 72 


IV- 72 


1-73 


11-73 


V 


I 


3.672 


3.509 


3.351 


3.201 


3.057 


V 


II 


3.674 


3.507 


3.346 


3.201 


3.043 


V 


II' 


3.669 


3.497 


3.329 


3.180 


3.024 


II 


I 


3.553 


3.293 


3.060 


2.851 


2.664 


II 


II 


3.574 


3.329 


3.114 


2.937 


2.764 


II 


II' 


3.567 


3.315 


3.092 


2.902 


2.713 


X 


I 


3.727 


3.613 


3.500 


3.389 


3.280 


X 


II 


3.729 


3.609 


3.488 


3.373 


3.234 


X 


II' 


3.726 


3.603 


3.479 


3.367 


3.241 


ACTUAL 




3.62 7 


3.535 


3.509 


3.404 


3.418 



The predictions between the three models were pretty close. 
In view of the variability of the predictions of total amount of savings it 
was not possible to state which of the three models performed better. 

2 . Behavior of Entire Population 

Both models predicted total number of accounts very closely 
for the first three quarters . The performance of Model II deteriorated 
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TOTAL AMOUNT OP SAVINGS IN THE SAMPLE IN MILLION DOLLARS 
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badly in the fifth quarter. Quarter 11-73 . The failure of Model II in 
Quarter 11-73 was attributed to the failure of the predictors of transition 
probabilities to predict beyond the data base from which they were derived. 
Predictions made with a matrix modified by the removal of predictors v/hich 
had X as an explanatory variable were closer to the actual value for 

D 

Quarters 1-73 and 11-73 than predictions by Model II. Table XXXIV compares 
the total number of accounts predicted by Model I, Model II and Model II' , 
Model II modified as described above. 



TABLE XXXIV 

TOTAL NUMBER OF ACCOUNTS PREDICTED BY 
MODEL I, MODEL II AND MODEL II' USING MOD III 





MODEL 


11-72 


11-72 


IV- 7 2 


1-73 


11-73 


TOTAL 


I 


17329 


17405 


17480 


17552 


17622 


NUMBER OF 
ACCOUNTS 


II 


17304 


17364 


17414 


17953 


18526 




II' 


17309 


17343 


17346 


17645 


17937 




ACTUAL 


17354 


17483 


17485 


17746 


17820 



It can be seen that Model I predictions are closer to the 
observed values for the first three quarters. However, unlike Models 
II and II', Model I could not predict the sudden increase in the number 
of accounts in Quarter 1-73. This, again, shows that Model I is appli- 
cable only when external conditions remain constant. 

Both models were equally bad in predicting the total amount 
of savings. The cause for the failure was attributed to sampling errors. 
Similar modifications were made to the transition matrix of both models. 



101 



TOTAL NUMBER OP ACCOUNTS IN THOUSANDS 
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The improvement finally achieved was substantial as can be seen in the 



following table: 



TABLE XXXV 

COMPARISON OF TOTAL AMOUNT OF SAVINGS PREDICTED 
BY MODELS I, II AND II' FOR QUARTER 11-73 





MODEL 


CPM X 


MOD I 


MOD II 


MOD III 


ACTUAL 


TOTAL 


I 


60.74 


56.67 


49.38 


44.79 


44.56 


AMOUNT 

OF 


II 


62.49 


58.47 


51.13 


46.19 


44.56 


SAVINGS 


II' 


61.23 


57.14 


49.92 


45.37 


44.56 



Predictions using CPM X, MOD I and MOD II are so different 
from the observations that the difference between Model I and Model II 1 
predictions are considered insignificant. In the case of predictions made 
using MOD III, the errors between prediction and observation are too small 
to discriminate between Model I and Model II' using just one point. Thus, 
Table XXXVI comparing the predictions of the three models using MOD III 
over the entire period of prediction, is presented below. 

TABLE XXXVI 

COMPARISON OF TOTAL AMOUNT OF SAVINGS PREDICTED 
. BY MODELS I, II AND II’ USING MOD III 





MODEL 


11-72 


III— 72 


IV-72 


1-73 


11-73 


TOTAL 


I 


41.94 


42.74 


43.48 


44.16 


44.79 


AMOUNT 

OF 


II 


41.90 


42.64 


43.31 


44.80 


46.19 


SAVINGS 


II' 


41.90 


42.55 


43.06 


44.28 


45.37 




ACTUAL 


41.57 


42.15 


42.40 


44.13 


44.56 



The predictive power of each model in predicting the size dis- 
tribution of the population could not be compared as the validation sample 



103 



was also used in estimating the parameters of MOD III. Thus, another 



sample would have to be taken to validate this capability of the two 
models. It is regrettable that this step could not be carried out at the 
time of the writing of this report because of lack of time. It is therefore 
proposed that the models be validated again at a later date. 
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PREDICTED TOTAL AMOUNT OF SAVINGS IN MILLION DOLLARS 
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V. SUMMARY AND CONCLUSIONS 



A. SUMMARY 

The purpose of this research has been to develop a model that can 
be used to study the structure of a population of savings accounts in a 
savings institution and to predict future levels of savings in the insti- 
tution . 

Two stochastic models were developed and evaluated in this study. 
The first model was based on the time stationary Markov chain model 
extended to cover the phenomena of opening and closing of accounts . 

The population was divided into ten classes and the continuous distribu- 
tion of amount of savings of each account was idealized by a discrete 
distribution with ten classes. The classes were numbered from two to 
eleven. The class intervals of Classes II to IX were $2,000. Class X 
contained all accounts with balances between $16,000 and $19,999 and 
Class XI contained all accounts with balances between $20,000 and 
$100,000. Class I was used as a reservoir for all the accounts that had 
closed. The parameters of Model I were assumed to be constant over the 
period of observation and prediction. 

The second model was based on the nonstationary Markov chain 
model. The parameters were not assumed to be constant. An econometric 
model was used to relate the estimates of the parameters to a set of 
exogenous variables. Predictors of the parameters, if found to be signi- 
ficant, were used to predict future values of the parameters. 
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By assuming that the mean of the amount of savings of accounts in 



each class remain constant with time the total amount of savings in each 
class could be computed by multiplying the number of accounts in each 
class by the mean. 

The parameters of the two models were estimated with data obtained 
from the local branch of a savings institution. The level of savings of a 
stratified sample of 622 accounts were observed over a period of ten 
quarters. Quarter 1-71 to Quarter 11-73. Movements of accounts between 
classes were recorded as transitions between the respective classes. The 
transition probability matrix was estimated by dividing the number of 
transitions from each class by the total number of accounts in the class 
at the beginning of the quarter. 

The total number of new accounts opened in each quarter of the 
period of observation was used to estimate the arrival rate or expected - 
number of new accounts per quarter. 

Two hundred and fifty new accounts were randomly selected each 
quarter. These were. used to determine if the size distribution of new 
accounts had changed during the period of observation. These accounts 
were classified into the ten classes described earlier and the probability 
of a new account being in each class estimated. These estimates were 
transformed into logits and regressed against a set of exogenous variables. 
The regressions that were considered significant were used as predictors 
for future values of the probability of a new account entering a particular 
class . 
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The structure of the population of savings accounts for Quarter 1-72 



was determined and used as the initial distribution in predictions of the 
behavior of the population. 

The chi square test was used to determine if the transition matrix 
had changed during the period of observation and if the predicted size 
distributions matched the observed distributions. 

The parameters of Model I were estimated using data from the first 
five quarters. The model was then used to predict the size distribution 
of accounts of the sample and the amount of savings in the sample popu- 
lation . 

The size distribution of the population of savings accounts was 
predicted using the distribution of the population at Quarter 1-72 as the 
initial distribution. Total number of accounts and total amount of savings 
were also predicted. 

Most of the parameters of Model II were estimated using data from 
the first five quarters. Of 110 transition probabilities 10 were found to 
vary significantly with the set of exogenous variables. Thus the transition 
matrix of Model II contained only ten predicted elements. The predictors 
were determined using data from the first eight quarters. 

Model II was used to predict the size distribution of accounts in 
the sample and the amount of savings in the sample. It was then used 
to predict the behavior of the population. 



108 



A sample comprising one fourth of the population of Quarter 1-73 
was used to test if the size distribution predicted by both models were 
any good. Predicted total number of accounts and total amount of savings 
were also tested by comparison with actual values observed over the 
prediction horizon. 

B. CONCLUSIONS 
1. Model I 

The hypothesis that the stochastic processes were stationary 
during the period of observation could not be rejected at the ten percent 
level of significance. Thus the assumption of stationarity could be con- 
sidered to hold. 

The predicted size distribution of the sample matched the 
observed distribution closely. The largest chi square statistic obtained 
was 11.91. This corresponded to the seventieth percentile of the chi 
square distribution with ten degrees of freedom. It was concluded that 
the sample of 622 accounts behaved as described by the Markov chain 
model . 

The predicted total amount of savings differed from the actual 
amount by a maximum of ten percent. It was concluded that Model I could 
predict total amount of savings but the variability in the prediction could 
be rather large as a small number of savers with large accounts could 
cause large fluctuations in the total amount of savings. 

Model I failed to predict the behavior of the population. The 
failure was attributed to errors in estimation of parameters of the transition 
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matrix. This observation was supported by the fact that predictions 
were substantially improved by changing the values of some transition 
probabilities. The additional data in the validation sample was used to 
adjust the estimates of a few transition probabilities. Predictions of 
total amount of savings made with this modified matrix were greatly 
improved. The maximum error was found to be half a percent. A good 
fit between predicted and total amount of savings by itself is not suf- 
ficient to indicate that the model has predicted the size distribution of 
the population correctly. However, as the predicted size distribution 
of the population of Quarter 1-73 has been made to fit the observed dis- 
tribution and if the structure of the population did not change drastically, 
over the period of observation, then it is plausible that the true transition 
matrix is not very different from the modified matrix. It is regrettable 
that time did not permit the drawing of further samples to validate the 
model so that a firmer conclusion could be reached. 

The fundamental matrix, obtained from the 'best' estimate of 
the transition matrix, predicted that the maximum total number of accounts 
in the institution will be 19363, and the maximum total amount of savings 
contributed by accounts below $100,000 will be $53.74 million, if the 
conditions existing during the period of the data were to persist. 

The average time an account remains opened was predicted 
to be 27.6 quarters, 6.9 years. The expected length of stay of an account, 
in the system, appeared to be independent of the amount of savings in 
the account when it first joined the system except if the amount was 
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less than $2 ,000 or more than $20,000. It was concluded that a saver's 
desire to remain a customer of the institution did not depend on his initial 
deposit . 

A small increase in the expected length of stay of an account, 
in the system, could have a large effect on the total amount of savings. 
Thus efforts to keep customers contented and remain longer in the system 
are important. 

2 . Model II 

The predicted size distributions of the sample were very close 
to the observed distribution for the first four periods. The maximum chi 
square statistic was 6.7 which is less than the thirtieth percentile of 
the chi square distribution with ten degrees of freedom. The chi square 
statistic for the fifth quarter. Quarter 11-73 shot up to 2 5.02. An inves- 
tigation showed that the model failed because five of the predictors of . 
transition probability were used beyond the data base on which they were 
derived thus giving erroneous predictions for Quarter 11-73. It was 
therefore concluded that Model II could predict accurately provided the 
predictors are not required to predict beyond the data base on which they 
were derived. 

The maximum percentage of error in predicting the total amount 
of savings was about ten. The predictions were very close to the pre- 
dictions made by Model I. 

Model II fared no better than Model I in the prediction of 
population behavior and for the same reasons as stated earlier. 



Ill 



3. 



Discussion 



Both models performed credibly in predicting the behavior of 
the sample of 622 accounts. This is encouraging as it leads one to con- 
clude that a population of savers does possess the Markovian property. 

Failure of the models to predict the behavior of the entire 
population correctly was attributed to errors in the estimation of par- 
ameters . This explanation is plausible, as modifications to the transition 
matrix, using additional data from the validation sample, yielded pre^ 
dictions of total amount of savings that were accurate to half a percent. 

As it is difficult to conceive, how a random sample could exhibit the 
Markovian behavior, with the population not possessing that characteristic, 
one is further led to believe in the above explanation. 

If external conditions do not have much influence on the be- 
havior of the population of savers then Model I, because of its simplicity, 
is the ideal model to use. Model I could still be used if the rate of 
change of the population behavior is slow. Transition probabilities 
could be estimated each quarter and exponential smoothing used to adjust 
the past estimates with this additional information. However, this model 
does not allow the use of additional information regarding the operating 
environment to improve the predictions. 

Model II has not been given an opportunity to demonstrate 
its capability because of the limited data base. It has the advantage 
of improvement with additional knowledge of the operating environment. 
However, its main limitation is in the requirement of predictions of 
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values of exogenous variables to predict future values of the parameters 
of the model. Thus, predictions of Model II are only as good as predic- 
tions of exogenous variables. The success of the model, therefore, 
depends to a great extent on the judgement of the forecaster. 

4 . Areas for Further Research 

The Markovian property of a population is an important popu- 
lation characteristic. The results observed in the application of the models 
to the sample should be verified using a larger number of accounts, pref- 
erably the entire population. A computerized bookkeeping system should 
be able to take on the additional task of counting the number of transitions 
between classes without much additional effort. 

The variability of predictions in total amount of savings 
could be reduced if the movement of large accounts could be predicted. 
Accounts with a balance exceeding $100,000 could be the subject of 
another study. 

The present study did not deal with the interaction between 
various types of accounts in a savings association. Movement of accounts 
between different types of accounts has an impact on the total amount of 
savings in the institution. This area merits further research especially 
if management desires to know the future level of savings of the whole 
institution . 

The variance of the predictions for more than one period is 
difficult to derive as the elements of the transition matrix are sums of 
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products of normal random variables, when the sample size is large. 

An alternate approach would be to use the Monte Carlo method to obtain 
an estimate of the variance. 

The specification of the econometric models used in predicting 
the transition probabilities, arrival rate and distribution of new accounts 
does not imply that the true relationships between parameters of the model 
and exogenous variables are as specified. This study has merely scratched 
the surface of the problem of identifying casual relationships between the 
parameters of the model and external factors. Further research in this 
area is necessary before reliable predictors can be developed for the 
parameters . 

C. RECOMMENDATIONS 

Model I can be turned into an operational tool with little effort. It 
is recommended that the parameters of the model be updated each quarter 
to reflect slight changes that may have taken place. If possible, the 
entire population be used to estimate the parameters. 

Model II can be made operational only after further research has 
been conducted to determine the predictors of the parameters of the model. 
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APPENDIX A 



DERIVATION OF THE VARIANCES OF NUMBER OF ACCOUNTS 
AND AMOUNT OF SAVINGS IN TFIE POPULATION FOR SINGLE 

STEP TRANSITION 



( 1 ) 

Let 



EXPECTATION, VARIANCE AND COVARIANCE OF RANDOM SUMS 
N be an integer random variable 
M be an integer random variable 

X. be i .i .d. 

i 

Y. be i .i .d. 

J 



X = 



N 

y x. 
w 1 



Y = 



M 

y y. 

n J 



E(XY) 



N 

l 

N 



M 



e( y x. y y.) 

^ 1 pi } 



E( 



T=L j=l 



X . , Y.) 
i ) 



= E(MN)E(X.Y.) 

i ) 

Cov(X , Y) = E(XY) + E(X)E(Y) 



E(MN)E(X,Y.) + E(N)E(X.)E(M)E(Y.) 

i ) i J 



If X. and Y. are uncorrelated then 
i J 



Cov(X, Y) 



Var(X) 



E(MN)E(X.)E(Y.) + E(N)E(M)E(X.)E(Y.) 

i ) 1 J 

E(X.)E(Y.)(E(MN) + E(M)E(N)) 
i J 

E(X.)E(Y.)Cov(M,N) 

. E 2 (X.)Var(N) + E(N)Var(X ) 
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Note: 


E(X) 


= E(N)E(X.) can be derived as follows; 

i 




E(X) 


CO 

- I 


E(x|N=n)P(N=n) 






n=0 

00 

■ £ 


nE(X)P(N=n) 






= E(N)E(X) 



(2) EXPECTATION AND VARIANCE OF NUMBER OF ACCOUNTS 



Let 



n. 



p ij 



x. . 
ij 



a+1 
n . 

3 



N 



a+1 



number of accounts in the ith class at beginning 
of time period a. 

transition probability between classes i and j. 
i = 2 , 3 , . . . m , j = 1 , 2 , ..,m 
number of transitions between classes i and j 
during period a . 

number of accounts in the jth class at beginning 
of time period a+1 . 

total number of accounts in the system at 



beginning of time period a+1. 

The assumption that accounts moving out of a class are distributed in 

accordance with a multinomial distribution with parameters (p.^ p. ^ 

p._ . . . p. ) is implicit in the Markov chain model. If it can be further 
i2 im 

assumed that accounts moving out of different classes are independent 
then the following expressions could be obtained. 
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a+1 



n. 



m 



U 



E(n a+1 ) 



m 



E(x. .) 
ij 



m 

y n,p. . 

h 1 1J 



Var(n a+ ^) 

J 



m 



m 



Var(x. .) 
ij 



n.p. .(1 - p. .) 
1 ij ij 



. Cov(x, . ,x, .) = 0 by 
ij kj 

assumption of independence 
between accounts exiting 
from different classes 



N 



a+1 



I n ' 



a+1 



E(N a+1 ) 



m 



S -p 



j=2 i=2 



m 

/ n.p.. 

1 U 



Var(N a+1 ) 



, a+1 > . "v- 1 +- _ , a+1 a+L 

Var(n. ) + 2 > > Cov(n, ,n ) 

j=2 J pi k=3 J K 

j ¥■ k 



. a+1 a+1 

Cov(n, ,n ) 
J & 



eov( ^ x , Ji V 
i=2 J 1=2 



m m 

E, 5 x *i 



h 1J 



m 



i, )E( 2, x ik> 
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JGQL 

1 

1=2 


in 

y e(x. .x )- 

f+2 '> lk 


m 


m 




y E(x. .x ) - E 

fe l) lk 


m 


m 


5 , 


y Cov(x..,x ) 
1+2 lk 



1=2 1-2 



ik lk 



"ij lk 



By assumption Cov(x..,x^) = 0 if i / 1 



m 



Cov(n a+ \n a+ ^) = y Cov(x. .,x ) 

J J £2 1J 1K 



As x., and x., are multinomial random variables from the same distribution 
ij ik 



Cov(x. . ,x._ ) 

ij ik 



- n.p. .p.. 

1 ij ik 



, a+1 a+1. 

Cov(n. ,n ) = 
J k 



m 

jJk 



- n.p. .p 
1 ij ik 



(3) EXPECTATION AND VARIANCE OF AMOUNT OF SAVINGS 



Let 



'kj 



z a+1 

J 



,a+l 



,a+l 



E(Z a+1 ) 

J 



size of the kth account that has entered the jth class 

amount of savings in class j at the beginning of 
period a+1 

total amount of savings in the system at the beginning 
of period a+1 
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(using results from ( 1 )) 



Var(Z^ +1 ) 



m 

y 


E(x..)E(z ) 


i=2 


ij kj 


m 


X. . 

v »g;v 


l 


m 

y 


E(x. .)Var(z, . 


v=2 


ij kj 



m - 1 m 



x 






Y cov Y? z k j'? Jz kj ) 

i^2 1=3 fel kj k=l kj 



kj ij 

m m 



+ 2 



Y ) E (z )Cov(x..x ) 

& fe kj 1J 



- £ 

1 « 



. .Var(z ,) + E (z )n.p..(l - p..) 
ij kj kj i ij 13 



The covariance terms drop out as Cov(x ,x^) = 0 if i 1 



Cov(Z a+1 / zf 1 )= Cov(£ f' ij z , f K, 

i =2 F =1 kj 6=2 T =1 kl 



J j '"l 



m m x. . x , 

= T > coyL 1 ’ V X? V 1 

i =2 n =2 k=l J n=l 



m 

= X, E(\ j )EU nl )Cov(x..,x n ) 



i= 2 



ij' il' 



m 

= X E( 2 kJ )E( 2 nl )E(z nl ,< - n i P ii P il ) 



i= 2 



Var(Z 3+1 ) 



m 



V Var(Z a+1 ) + 2 f’ f Cov(Z a+1 .zf 1 ) 
j= 2 J j=2 1= 3 J 1 
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TRANSITION FREQUENCY MATRIX EETWEEN QUARTER 2 AND QUARTER 
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SUM 21 175 92 65 35 33 65 34 33 24 28 605 



TRANSITION FREQUENCY MATRIX BETWEEN QUARTER 3 AND QUARTER 
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SUM 20 174 89 62 36 28 61 29 27 24 34 584 



TRANSITION FREQUENCY MATRIX BETWEEN QUARTER 4 AND QUARTER 
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SUM 26 168 90 55 37 20 60 30 19 23 35 563 



TRANSITION FREQUENCY MATRIX BETWEEN QUARTER 5 AND QUARTER 
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SUM L8 L 69 80 51 37 22 57 21 22 20 40 537 



TRANSITION FREQUENCY MATRIX BETWEEN QUARTER 6 AND QUARTER 
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SUM 17 160 



TRANSITION FREQUENCY MATRIX BETWEEN QUARTER 7 AND QUARTER 
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SUM 13 153 76 57 30 17 45 23 25 26 33 502 



TRANSITION FREQUENCY MATRIX BETWEEN CUARTER 8 AND CUARTER 
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SUM L3 148 78 53 30 21 44 23 21 25 33 485 



TRANSITION FREQUENCY MATRIX BETWEEN QUARTER 9 AND QUARTER 10 



szooococoo^'Tcn^mm 
z> <rr^*mmc\)vrc\ir\i(\jcn 

00 r-4 



O f— I O rm4 *— I r— < O r— < | LO CT* 

X (\l 



xoooooo— 'Osj- r^o 



O O O O o 



o m 



CNJ o 




o — • o m rn 



<M C\J r— I »— I kH 

m 



*— •OOr-<^-J r _jCOC\|r^OO 
> 



> o o 



h- CM 
CM 



O O O O 



O *—• 




•-H O 



CNJ 



ro 



CNJ 



KH > 



> -« 



X x 



127 



SUM 25 141 69 45 31 15 46 25 17 22 40 476 



ESTIMATE OF TRANSITION MATRIX BETWEEN QUARTER 2 AND QUARTER 
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0385 0.0385 0.0 0.0 0.0 0.0 0.0 0.0 C.O 0.0769 0.8462 



ESTIMATE OF TRANSITION MATRIX BETWEEN QUARTER 3 AND QUARTER 
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ESTIMATE OF TRANSITION MATRIX BETWEEN QUARTER 9 AND QUARTER 10 
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CUMULATIVE TRANSITION FREQUENCY MATRIX OF QUARTER I TO QUARTER 
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SUM L 7 L82 92 60 50 24 76 38 29 28 26 622 



CUMULATIVE TRANSITION FREQUENCY MATRIX CF QUARTER I TO QUARTER 
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SUM 38 357 184 125 85 57 141 72 62 52 54 1227 



CUMULATIVE TRANSITION FREQUENCY MATRIX OF QUARTER I TO QUARTER 
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SUM 58 531 273 187 121 85 202 101 89 76 88 1811 



CUMULATIVE TRANSITION FREQUENCY MATRIX OF QUARTER 1 TO QUARTER 



s: 


O 


I s - 


in 






ZD 




m 


00 


CO 




00 




sO 


CO 


CM 





o co o 

(n CM 

CM — » *-* 



CM 



X 



O CM 



CO O CO O CM CM 



in in 

r— i 



in 



xoooooo— 'r- 



O >T 



o 



o 



O o CM 



Nt CM CM 
CM I s - 



O 



o 



CM 



O 



CO 

CM 



> 



CT* r-~t 

00 



CO O 



o »— < * 



> 



sT 



U' 



O' 

CM 



vO 



•— i 
> 



O CM CM 



Nt I s - 



CM 

O 



CM 



CO 



CM 



> o cm m 



0 s 



CM 






in 



CO O CM 



o 




O' 


o 




CM 




O 


in 


r-i 


CO 






CO 


I s - 


— * 




















_ 1 

















o 


•4- 




CO 


o 


o 


in 








o 






I s - 


— < 


»«-4 


















CM 



















o 


60 8 


52 




CO 


in 


r- 


CM 


CO 


CM 


CO 


o 




0" 


O' 


in 


co 


CM 


CO 


CO 


4- 


CM 



CO 






> 



X 

►—I 



X X 



139 



SUM 84 699 363 242 158 105 262 131 108 99 123 2374 



CUMULATIVE TRANSITION FREQUENCY MATRIX OF QUARTER 1 TC QUARTER 
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SUM 102 868 443 293 195 127 319 152 130 119 163 2911 



CUMULATIVE TRANSITION FREQUENCY MATRIX OF QUARTER l TO QUARTER 
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SUM 119 1028 523 343 233 147 369 179 151 140 198 3430 



CUMULATIVE TRANSITION FREQUENCY MATRIX CF QUARTER I TO QUARTER 
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SUM 132 LL81 599 400 263 164 4l£ 202 176 166 231 3932 



CUMULATIVE TRANSITION FREQUENCY MATRIX OF QUARTER I TC QUARTER 
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SUM 145 1329 677 453 293 185 462 225 197 191 264 4421 



CUMULATIVE TRANS IT I CN FREQUENCY MATRIX OF QUARTER 1 TO QUARTER 10 
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